Skip to content

fix(scanner): remove C7/C4 HIGH false positives found validating 8 real Python projects#46

Merged
vinicq merged 3 commits into
mainfrom
fix/python-validation-fp
Jun 5, 2026
Merged

fix(scanner): remove C7/C4 HIGH false positives found validating 8 real Python projects#46
vinicq merged 3 commits into
mainfrom
fix/python-validation-fp

Conversation

@vinicq
Copy link
Copy Markdown
Owner

@vinicq vinicq commented Jun 5, 2026

What

Validating the scanner against 8 real Python projects (each >200 stars, >=500 tests) surfaced ~47 HIGH-confidence false positives in two rules. HIGH blocks commits, so these are the worst kind. This PR removes them, with regression tests, and records the measured numbers.

Projects (owner/repo): encode/httpx, encode/starlette, pallets/flask, fastapi/fastapi, encode/django-rest-framework, aio-libs/aiohttp, sanic-org/sanic, pallets/werkzeug.

The fixes

C7 (compares a value to itself). Was firing on deliberate __eq__/__hash__ tests. Now assert x == x is exempt only when the same test runs a discriminating check on the same operand: x != peer / not x == peer against a distinct peer (a constant like None does not count), or x in {x} membership in a literal container that holds x (x in some_registry does not count). A lone assert x == x still fires.

C4 (forgotten/uncollected test). Was firing on test*-named web route handlers (@app.get/@app.post/@Request.application/@click.command) and on local helper coroutines. New principle: a function that is referenced (called, awaited, scheduled via asyncio.create_task, or passed as a callback) actually runs, so it is not forgotten. Only an undecorated, no-arg nested test* with a check in its own body that is never referenced, or a top-level test-shaped function never called, is flagged. The top-level reference check is scoped to module level so an unrelated same-name local elsewhere does not excuse a real forgotten test.

Result

  • 0 HIGH across all 8 projects (down from ~47).
  • Scanner suite 90 -> 104 tests; every fix has a fires-on-bad and a stays-clean test, plus three tests closing the false-negative holes a pre-push architect review found.
  • Drift identical (bundled scan.py byte-for-byte), self-scan clean, ruff clean.
  • reference.md gains C7/C4/C18 look-alike notes; CHANGELOG records the fixes; README + VALIDATION.md carry the measured numbers (scanner 47->0, and the semantic-pass benchmark: precision 1.00 / recall 0.70 on a small model).

Closes #39
Closes #40
Closes #41

vinicq added 3 commits June 5, 2026 12:28
Validating the scanner against 8 real Python projects (encode/httpx,
encode/starlette, pallets/flask, fastapi/fastapi, encode/django-rest-framework,
aio-libs/aiohttp, sanic-org/sanic, pallets/werkzeug) surfaced two HIGH false
positives. HIGH blocks commits, so these are the worst kind.

C7 (self-compare): `assert x == x` is now exempt when the same test also runs a
discriminating or membership check on the same operand - `assert x != y`,
`assert not x == y`, or `assert x in {x}`. That pair is a deliberate
__eq__/__hash__ test (reflexive AND distinguishing), not a tautology. A lone
`assert x == x` still fires. Seen in aio-libs/aiohttp and encode/starlette.

C4 (forgotten test): a `test*`-named function is no longer flagged when it is a
web route handler / WSGI app (@app.get/@app.post/@Request.application/
@click.command) or when it is referenced - called, awaited, scheduled via
asyncio.create_task, or passed as a callback. A referenced function runs, so it
is not forgotten. Only a nested `test*` with a check in its own body that is
never referenced, or a top-level test-shaped function never called, still fires.
Seen in fastapi, werkzeug, sanic, flask, aiohttp.

Re-scan after the fix: 0 HIGH across all 8 projects (was 16+9+14+7+... before).
Each fix carries a fires-on-bad and a stays-clean regression test. reference.md
gains the C7/C4/C18 look-alike notes; CHANGELOG records both fixes. Bundled
scanner copy kept byte-identical (drift check passes).

Closes #39
Closes #40
Closes #41
…+ LLM pass)

Scanner: 8 real projects (encode/httpx, encode/starlette, pallets/flask,
fastapi/fastapi, encode/django-rest-framework, aio-libs/aiohttp, sanic-org/sanic,
pallets/werkzeug), ~47 HIGH false positives in C7/C4 fixed, 0 HIGH on re-scan.

Semantic pass: first labeled benchmark (24 Python cases) run blind on a small
model (Claude Haiku) scored precision 1.00, recall 0.70 (1.00 on clear-cut
smells), F1 0.82 - the evidence behind "a small model is enough for a
precision-first semantic pass". VALIDATION.md row moved from in-progress to
completed; README "How falsegreen is validated" carries the figures.

Raw benchmark data, the spreadsheet, and the working report stay local
(.handoff/, gitignored); only the measured conclusions are published here.
Senior-architect pre-push review found two narrow holes where the new
exemptions were broader than intended:

C7: the eq-semantics exemption fired on ANY later `!=`/`in` mentioning the
operand. Now it requires a DISTINCT peer for `!=`/`not ==` (a constant like
None does not count) and a literal container that holds the operand for
membership (`x in {x}`, not `x in some_registry`). So `assert x == x` next to
`x != None` or `x in some_registry` stays C7.

C4: the top-level forgotten-test exemption used name_is_used over the whole
module, so an unrelated same-name local rebinding in another function wrongly
excused a real forgotten test. New name_used_at_module_level counts only a call
target `name(...)` anywhere (covers `asyncio.run(main())`) or a module-level
Load. Nested defs keep the broader scoped check (callbacks registered by bare
name still count).

Three regression tests added for the closed holes. 104 tests green, drift
identical, self-scan + ruff clean.
@github-actions github-actions Bot added the bug Something isn't working label Jun 5, 2026
@vinicq
Copy link
Copy Markdown
Owner Author

vinicq commented Jun 5, 2026

Validation (pre-merge):

  • Senior static-analysis review of the diff: approved. It flagged two narrow false-negative holes in the first cut (C7 exemption firing on any !=/in mentioning the operand; C4 top-level reference check too broad). Both are now closed in this PR (distinct-peer + literal-container for C7; module-level-scoped reference check for C4), each with a regression test.
  • CI green on 3.8 / 3.11 / 3.13. Drift check passes (bundled scan.py byte-identical). Self-scan and ruff clean. 104 scanner tests.
  • Re-scan of all 8 validation projects: 0 HIGH.

Merging.

@vinicq vinicq merged commit e958cee into main Jun 5, 2026
4 checks passed
@vinicq vinicq deleted the fix/python-validation-fp branch June 5, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

1 participant