fix(scanner): remove C7/C4 HIGH false positives found validating 8 real Python projects by vinicq · Pull Request #46 · vinicq/falsegreen

vinicq · 2026-06-05T17:07:14Z

What

Validating the scanner against 8 real Python projects (each >200 stars, >=500 tests) surfaced ~47 HIGH-confidence false positives in two rules. HIGH blocks commits, so these are the worst kind. This PR removes them, with regression tests, and records the measured numbers.

Projects (owner/repo): encode/httpx, encode/starlette, pallets/flask, fastapi/fastapi, encode/django-rest-framework, aio-libs/aiohttp, sanic-org/sanic, pallets/werkzeug.

The fixes

C7 (compares a value to itself). Was firing on deliberate __eq__/__hash__ tests. Now assert x == x is exempt only when the same test runs a discriminating check on the same operand: x != peer / not x == peer against a distinct peer (a constant like None does not count), or x in {x} membership in a literal container that holds x (x in some_registry does not count). A lone assert x == x still fires.

C4 (forgotten/uncollected test). Was firing on test*-named web route handlers (@app.get/@app.post/@Request.application/@click.command) and on local helper coroutines. New principle: a function that is referenced (called, awaited, scheduled via asyncio.create_task, or passed as a callback) actually runs, so it is not forgotten. Only an undecorated, no-arg nested test* with a check in its own body that is never referenced, or a top-level test-shaped function never called, is flagged. The top-level reference check is scoped to module level so an unrelated same-name local elsewhere does not excuse a real forgotten test.

Result

0 HIGH across all 8 projects (down from ~47).
Scanner suite 90 -> 104 tests; every fix has a fires-on-bad and a stays-clean test, plus three tests closing the false-negative holes a pre-push architect review found.
Drift identical (bundled scan.py byte-for-byte), self-scan clean, ruff clean.
reference.md gains C7/C4/C18 look-alike notes; CHANGELOG records the fixes; README + VALIDATION.md carry the measured numbers (scanner 47->0, and the semantic-pass benchmark: precision 1.00 / recall 0.70 on a small model).

Closes #39
Closes #40
Closes #41

Validating the scanner against 8 real Python projects (encode/httpx, encode/starlette, pallets/flask, fastapi/fastapi, encode/django-rest-framework, aio-libs/aiohttp, sanic-org/sanic, pallets/werkzeug) surfaced two HIGH false positives. HIGH blocks commits, so these are the worst kind. C7 (self-compare): `assert x == x` is now exempt when the same test also runs a discriminating or membership check on the same operand - `assert x != y`, `assert not x == y`, or `assert x in {x}`. That pair is a deliberate __eq__/__hash__ test (reflexive AND distinguishing), not a tautology. A lone `assert x == x` still fires. Seen in aio-libs/aiohttp and encode/starlette. C4 (forgotten test): a `test*`-named function is no longer flagged when it is a web route handler / WSGI app (@app.get/@app.post/@Request.application/ @click.command) or when it is referenced - called, awaited, scheduled via asyncio.create_task, or passed as a callback. A referenced function runs, so it is not forgotten. Only a nested `test*` with a check in its own body that is never referenced, or a top-level test-shaped function never called, still fires. Seen in fastapi, werkzeug, sanic, flask, aiohttp. Re-scan after the fix: 0 HIGH across all 8 projects (was 16+9+14+7+... before). Each fix carries a fires-on-bad and a stays-clean regression test. reference.md gains the C7/C4/C18 look-alike notes; CHANGELOG records both fixes. Bundled scanner copy kept byte-identical (drift check passes). Closes #39 Closes #40 Closes #41

…+ LLM pass) Scanner: 8 real projects (encode/httpx, encode/starlette, pallets/flask, fastapi/fastapi, encode/django-rest-framework, aio-libs/aiohttp, sanic-org/sanic, pallets/werkzeug), ~47 HIGH false positives in C7/C4 fixed, 0 HIGH on re-scan. Semantic pass: first labeled benchmark (24 Python cases) run blind on a small model (Claude Haiku) scored precision 1.00, recall 0.70 (1.00 on clear-cut smells), F1 0.82 - the evidence behind "a small model is enough for a precision-first semantic pass". VALIDATION.md row moved from in-progress to completed; README "How falsegreen is validated" carries the figures. Raw benchmark data, the spreadsheet, and the working report stay local (.handoff/, gitignored); only the measured conclusions are published here.

Senior-architect pre-push review found two narrow holes where the new exemptions were broader than intended: C7: the eq-semantics exemption fired on ANY later `!=`/`in` mentioning the operand. Now it requires a DISTINCT peer for `!=`/`not ==` (a constant like None does not count) and a literal container that holds the operand for membership (`x in {x}`, not `x in some_registry`). So `assert x == x` next to `x != None` or `x in some_registry` stays C7. C4: the top-level forgotten-test exemption used name_is_used over the whole module, so an unrelated same-name local rebinding in another function wrongly excused a real forgotten test. New name_used_at_module_level counts only a call target `name(...)` anywhere (covers `asyncio.run(main())`) or a module-level Load. Nested defs keep the broader scoped check (callbacks registered by bare name still count). Three regression tests added for the closed holes. 104 tests green, drift identical, self-scan + ruff clean.

vinicq · 2026-06-05T17:08:51Z

Validation (pre-merge):

Senior static-analysis review of the diff: approved. It flagged two narrow false-negative holes in the first cut (C7 exemption firing on any !=/in mentioning the operand; C4 top-level reference check too broad). Both are now closed in this PR (distinct-peer + literal-container for C7; module-level-scoped reference check for C4), each with a regression test.
CI green on 3.8 / 3.11 / 3.13. Drift check passes (bundled scan.py byte-identical). Self-scan and ruff clean. 104 scanner tests.
Re-scan of all 8 validation projects: 0 HIGH.

Merging.

vinicq added 3 commits June 5, 2026 12:28

github-actions Bot added the bug Something isn't working label Jun 5, 2026

vinicq merged commit e958cee into main Jun 5, 2026
4 checks passed

vinicq deleted the fix/python-validation-fp branch June 5, 2026 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scanner): remove C7/C4 HIGH false positives found validating 8 real Python projects#46

fix(scanner): remove C7/C4 HIGH false positives found validating 8 real Python projects#46
vinicq merged 3 commits into
mainfrom
fix/python-validation-fp

vinicq commented Jun 5, 2026

Uh oh!

vinicq commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vinicq commented Jun 5, 2026

What

The fixes

Result

Uh oh!

vinicq commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant