Skip to content

Commit e7f4e7e

Browse files
authored
fix: add behavioral depth criteria to task acceptance and parity checks (#225)
* fix: add behavioral depth criteria to task acceptance and parity checks The task graph builder was generating only structural acceptance criteria (symbols exist, signatures match, compiles) while the parity verifier enforced full behavioral equivalence. This gap caused the code-migrator to produce hollow implementations that compiled but didn't actually perform the intended computation — leading to expensive retry loops. Both buildAcceptanceCriteria() and buildParityChecks() now generate behavioral criteria for every task containing functions: Acceptance criteria: - Full implementation required (no stubs/TODOs/placeholders) - Behavioral equivalence using idiomatic target-language patterns - Implementation depth must match source complexity Parity checks: - All source code paths reachable in target - No hollow implementations (input-dependent output required) - Internal call chains wired end-to-end These criteria apply uniformly regardless of task size — a 30-line hash function gets the same behavioral bar as a 900-line compressor. Criteria use behavioral language (same observable outputs) rather than structural language, so idiomatic rewrites are not penalized. * fix: clarify guidance to ban unsafe and prioritize safety over performance
1 parent 6730a60 commit e7f4e7e

3 files changed

Lines changed: 128 additions & 2 deletions

File tree

src/core/task-graph-builder.ts

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1690,6 +1690,27 @@ function buildAcceptanceCriteria(cluster: Cluster): string[] {
16901690
}
16911691
criteria.push('Call-site signatures match upstream dependency contracts');
16921692
criteria.push('Target code compiles without type errors');
1693+
1694+
// Behavioral depth: every function body must contain a real implementation,
1695+
// not stubs, pass-throughs, or hollow wrappers. The target language idioms
1696+
// may differ from the source (Result vs error code, Vec vs linked list, etc.)
1697+
// but the *computational depth* must match: if the source performs a non-trivial
1698+
// transformation (compression, hashing, encoding, decoding, parsing, etc.),
1699+
// the target must perform an equivalently non-trivial transformation.
1700+
const hasFunctions = cluster.symbols.some(
1701+
s => s.kind === 'function' || s.kind === 'method',
1702+
);
1703+
if (hasFunctions) {
1704+
criteria.push(
1705+
'Every function body is fully implemented — no stubs, TODOs, unimplemented!() macros, or placeholder logic',
1706+
);
1707+
criteria.push(
1708+
'Behavioral equivalence: for all reachable inputs, the migrated code produces the same observable outputs, side effects, and error conditions as the source — using idiomatic target-language patterns (different types, signatures, and error models are expected)',
1709+
);
1710+
criteria.push(
1711+
'Implementation depth matches source complexity — if the source performs a non-trivial transformation (compression, hashing, encryption, codec logic, etc.), the target must perform an equivalently non-trivial computation, not a pass-through or synthetic wrapper',
1712+
);
1713+
}
16931714
return criteria;
16941715
}
16951716

@@ -1700,5 +1721,23 @@ function buildParityChecks(cluster: Cluster): string[] {
17001721
if (cluster.symbols.some(s => s.kind === 'type' || s.kind === 'class' || s.kind === 'struct')) {
17011722
checks.push('Type definitions preserve public field names and types');
17021723
}
1724+
1725+
// Behavioral parity checks — enforce depth without enforcing structural
1726+
// similarity. These checks tell the parity verifier (and the migrator)
1727+
// what "correct" means for this task: same behavior, not same shape.
1728+
const hasFunctions = cluster.symbols.some(
1729+
s => s.kind === 'function' || s.kind === 'method',
1730+
);
1731+
if (hasFunctions) {
1732+
checks.push(
1733+
'All source code paths and branches are reachable in the target — no dead dispatch or unreachable algorithm branches',
1734+
);
1735+
checks.push(
1736+
'No hollow implementations: functions must produce non-trivial, input-dependent output matching the source semantics (not zeros, defaults, or pass-through copies)',
1737+
);
1738+
checks.push(
1739+
'Internal call chains are wired end-to-end — public entry points transitively invoke the same algorithmic stages as the source (e.g., a compressor must call encoding, matching, and entropy stages, not return the input with framing)',
1740+
);
1741+
}
17031742
return checks;
17041743
}

tests/core/task-graph-builder.test.ts

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -846,6 +846,92 @@ describe('buildTaskGraph', () => {
846846
);
847847
}
848848
});
849+
850+
it('should include behavioral acceptance criteria for tasks with functions', async () => {
851+
const dbPath = join(tempDir, 'kb.db');
852+
const db = createTestDb(dbPath);
853+
const f = insertFile(db, 'src/codec.c');
854+
insertSymbol(db, f, 'compress', 'function', 1, 80);
855+
db.close();
856+
857+
const result = await buildTaskGraph({ ...DEFAULT_OPTIONS, kbDbPath: dbPath });
858+
const task = result.tasks.find(t =>
859+
t.symbols?.some(s => s.name === 'compress'),
860+
);
861+
expect(task).toBeDefined();
862+
expect(task!.acceptanceCriteria).toEqual(
863+
expect.arrayContaining([
864+
expect.stringMatching(/fully implemented/i),
865+
expect.stringMatching(/behavioral equivalence/i),
866+
expect.stringMatching(/implementation depth/i),
867+
]),
868+
);
869+
});
870+
871+
it('should include behavioral parity checks for tasks with functions', async () => {
872+
const dbPath = join(tempDir, 'kb.db');
873+
const db = createTestDb(dbPath);
874+
const f = insertFile(db, 'src/hash.c');
875+
insertSymbol(db, f, 'xxh64', 'function', 1, 80);
876+
db.close();
877+
878+
const result = await buildTaskGraph({ ...DEFAULT_OPTIONS, kbDbPath: dbPath });
879+
const task = result.tasks.find(t =>
880+
t.symbols?.some(s => s.name === 'xxh64'),
881+
);
882+
expect(task).toBeDefined();
883+
expect(task!.parityChecks).toEqual(
884+
expect.arrayContaining([
885+
expect.stringMatching(/code paths.*reachable/i),
886+
expect.stringMatching(/hollow implementations/i),
887+
expect.stringMatching(/call chains.*wired/i),
888+
]),
889+
);
890+
});
891+
892+
it('should not include behavioral criteria for type-only tasks', async () => {
893+
const dbPath = join(tempDir, 'kb.db');
894+
const db = createTestDb(dbPath);
895+
const f = insertFile(db, 'src/types.c');
896+
insertSymbol(db, f, 'Options', 'struct', 1, 40);
897+
insertSymbol(db, f, 'Mode', 'enum', 41, 60);
898+
db.close();
899+
900+
const result = await buildTaskGraph({ ...DEFAULT_OPTIONS, kbDbPath: dbPath });
901+
const task = result.tasks.find(t =>
902+
t.symbols?.every(s => s.kind === 'struct' || s.kind === 'enum'),
903+
);
904+
expect(task).toBeDefined();
905+
// Should NOT have function-specific behavioral criteria
906+
expect(task!.acceptanceCriteria.join(' ')).not.toMatch(/fully implemented/i);
907+
expect(task!.parityChecks.join(' ')).not.toMatch(/hollow implementations/i);
908+
});
909+
910+
it('should apply behavioral criteria uniformly regardless of task size', async () => {
911+
const dbPath = join(tempDir, 'kb.db');
912+
const db = createTestDb(dbPath);
913+
const f = insertFile(db, 'src/util.c');
914+
// Small function — above micro-elision threshold but still simple
915+
insertSymbol(db, f, 'crc32', 'function', 1, 40);
916+
db.close();
917+
918+
const result = await buildTaskGraph({ ...DEFAULT_OPTIONS, kbDbPath: dbPath });
919+
const task = result.tasks.find(t =>
920+
t.symbols?.some(s => s.name === 'crc32'),
921+
);
922+
expect(task).toBeDefined();
923+
// Even a tiny function gets the full behavioral criteria
924+
expect(task!.acceptanceCriteria).toEqual(
925+
expect.arrayContaining([
926+
expect.stringMatching(/implementation depth/i),
927+
]),
928+
);
929+
expect(task!.parityChecks).toEqual(
930+
expect.arrayContaining([
931+
expect.stringMatching(/call chains.*wired/i),
932+
]),
933+
);
934+
});
849935
});
850936

851937
// ─── buildDependencySummary Tests ───────────────────────────────────────────

tests/fixtures/zstd-c-project/migration.config.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@
33
"guidance": [
44
"Do NOT use any existing Rust crates that wrap the C implementation (e.g. zstd, zstd-safe, zstd-sys, lz4-sys). Write a pure native Rust port of the C source code.",
55
"Do NOT use FFI, bindgen, or any C interop. All code must be idiomatic safe Rust.",
6-
"Preserve the original algorithmic structure so that the Rust port is auditable against the C reference.",
7-
"Performance parity with the C implementation is required. Retain performance-critical optimizations such as SIMD intrinsics using minimal unsafe Rust where necessary."
6+
"All code must be safe Rust. Do NOT use `unsafe` blocks, `unsafe fn`, raw pointers (`*const`, `*mut`), or `core::ffi::c_void`. Use slices (`&[u8]`, `&mut [u8]`), Vec, iterators, and Rust's standard byte-order methods (e.g. `u32::from_le_bytes`, `to_be_bytes`) instead of pointer casts and manual memory access. If a C pattern seems to require unsafe, find the safe Rust equivalent — it almost always exists.",
7+
"Preserve the algorithmic logic (same algorithmic steps, same data flow) so the port is auditable against the C reference — but use idiomatic Rust types and APIs. The algorithm should be recognizable, not the function signatures or pointer patterns.",
8+
"Prefer correctness and safety over micro-optimization. It is acceptable for the Rust port to be slightly slower than the C original if the alternative is unsafe code. Do not use unsafe for performance reasons."
89
],
910
"source": {
1011
"path": "./zstd-src/zstd-1.5.7",

0 commit comments

Comments
 (0)