Skip to content

feat(graph): SchemaField nodes + MirrorsField heuristic edges (T4-7)#291

Merged
coseto6125 merged 1 commit into
mainfrom
feat/t4-7-mirrors-field
May 21, 2026
Merged

feat(graph): SchemaField nodes + MirrorsField heuristic edges (T4-7)#291
coseto6125 merged 1 commit into
mainfrom
feat/t4-7-mirrors-field

Conversation

@coseto6125
Copy link
Copy Markdown
Owner

Summary

Closes the dead-data gap from T4-2/T4-3/T4-4: detector outputs (RawSchemaField) are now promoted to actual graph nodes + edges, unlocking the user-visible value of schema cross-binding.

What lands:

  • New crates/ecp-analyzer/src/post_process/schema_field_mirrors.rs runs after class_membership/overrides. It promotes each RawSchemaField to a Node { kind: SchemaField, ... } + HasProperty edge from the owning class, then buckets by (name.to_lowercase(), SchemaType) and emits pairwise MirrorsField heuristic edges (confidence 0.9) where the 4-point strict rubric is satisfied. D3 cluster semantics emit all pairs for k>=3 with uniform (name, type, owner_class).
  • Refactor: RawSchemaField.name / owner_class switched from StrRef to Box<str>. The pre-T4-7 design interned into a per-file StringPool that the parser dropped at scope exit, leaving the StrRefs pointing to deallocated memory. Owning the string sidesteps the bug entirely (~16 B extra per field, no pool plumbing).
  • No GRAPH_FORMAT_VERSION bump — NodeKind::SchemaField and RelType::MirrorsField already exist on main (T0-1 / T-H1 shipped). Only addition is Hash derive on SchemaType for the bucket key.
  • 7 new integration tests covering pair-match / 3-way cluster / different-owner drop / structural HasProperty direction / heuristic-flag invariant / empty-fast-path.

Acknowledged v1 gaps

  • BlindSpot emission for 3/4 partial matches is deferred. Tracked by ignored test_partial_match_emits_blindspot + Phase 2 docs in schema_field_mirrors.rs.

Test plan

@coseto6125 coseto6125 enabled auto-merge (squash) May 21, 2026 16:53
coseto6125 added a commit that referenced this pull request May 21, 2026
Ubuntu CI was hitting os error 28 (No space left on device) at link
time — 14 tree-sitter parsers × 30+ test binaries × default debug-info
filled the runner's 14 GB free space (last seen on PR #291 run 26240379100,
tantivy.rlib link failed mid-cascade with callmeta_c_cpp misreported as
the failing test).

Two stacked mitigations:

1. Ubuntu-only `jlumbroso/free-disk-space` step before checkout — drops
   tool-cache + android + dotnet + haskell, reclaiming ~30 GB. macOS &
   windows have 50+ GB free by default; skipped there.
2. Job-level `CARGO_PROFILE_TEST_DEBUG=0` — shrinks target/debug/deps/
   roughly 70%. Assertions stay on; only DWARF data is dropped.
@coseto6125 coseto6125 merged commit 4ef44ce into main May 21, 2026
17 of 18 checks passed
@coseto6125 coseto6125 deleted the feat/t4-7-mirrors-field branch May 21, 2026 17:50
coseto6125 added a commit that referenced this pull request May 21, 2026
Three recent PRs landed on main with overlapping schema changes but
were not rebased against each other, leaving main in a state that
fails to compile (CI red on 162c52d):

- #285 (T1-4 + T1-5 + T1-11) — `Node.uid: StrRef → u64`,
  added `Node.owner_class: StrRef`, made `uid::compute` the canonical
  UID source.
- #291 (T4-7 SchemaField + MirrorsField) — added
  `post_process/schema_field_mirrors.rs`, switched
  `RawSchemaField.{name, owner_class}` from `StrRef` to `Box<str>`.
- #292 (T7-2 per-symbol content_hash) — added `Node.content_hash: u64`.

The five resulting compile errors are mechanical: each call site needs
to be brought to the shape the PR author would have written if they
had rebased against the other two. This commit applies that
reconciliation without rewriting history (no force-push to main).

- `post_process/schema_field_mirrors.rs:99-118` — replace the
  `format!()` + `string_pool.add()` UID construction with
  `uid::compute(NodeKind::SchemaField, &path_str, Some(owner_name), field_name)`,
  the T1-5 canonical pattern. Adds the now-required `content_hash: 0`
  (synthetic mirror node, no source span — per T7-2 doc convention) and
  `owner_class: string_pool.add(owner_name)` (T1-11 rename isolation
  key: a SchemaField like `Foo.id` correctly belongs to class `Foo`).
  Side effect: drops one heap allocation per mirror node (no more
  `format!()` String + pool.add round-trip).
- `protobuf/parser.rs:101-110` — replace `pool.add(&field_name)` and
  `pool.add(owner)` (StrRef-returning) with
  `field_name.into_boxed_str()` and `Box::from(owner.as_str())`. Drops
  the now-unused `pool: &mut StringPool` parameter from
  `extract_proto_fields` plus the `StringPool` allocation at the call
  site (3 cosmetic edits in one file).
- `python/parser.rs` — add `use ecp_core::pool::StringPool;` import
  that the existing T5-2 event-topic wire-up at line 1094 already
  assumed.

Verified: `cargo check --workspace --all-targets --all-features`,
`cargo clippy --workspace --all-targets --all-features -- -D warnings`,
and `cargo test --workspace --no-fail-fast` (2805 passed, 15 ignored)
all clean.
coseto6125 added a commit that referenced this pull request May 21, 2026
… T7-2 (Node content_hash/owner_class)

The merges of #291 (T4-7) and #292 (T7-2) left two downstream files behind.
Required by this branch's pre-push clippy gate; expected to drop on rebase
once fix/main-compile-post-291-292 lands.

- crates/ecp-analyzer/src/protobuf/parser.rs: RawSchemaField now stores
  Box<str>, not StrRef; drop the per-call StringPool plumbing.
- crates/ecp-analyzer/src/post_process/schema_field_mirrors.rs: Node
  gained content_hash + owner_class fields and SchemaField UIDs are now
  computed via uid::compute, not pool-interned format!() strings.
coseto6125 added a commit that referenced this pull request May 21, 2026
… T7-2 (Node content_hash/owner_class)

The merges of #291 (T4-7) and #292 (T7-2) left two downstream files behind.
Required by this branch's pre-push clippy gate; expected to drop on rebase
once fix/main-compile-post-291-292 lands.

- crates/ecp-analyzer/src/protobuf/parser.rs: RawSchemaField now stores
  Box<str>, not StrRef; drop the per-call StringPool plumbing.
- crates/ecp-analyzer/src/post_process/schema_field_mirrors.rs: Node
  gained content_hash + owner_class fields and SchemaField UIDs are now
  computed via uid::compute, not pool-interned format!() strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant