Skip to content

docs(validation): record TypeScript semantic-pass benchmark numbers#47

Merged
vinicq merged 1 commit into
mainfrom
docs/ts-validation-numbers
Jun 5, 2026
Merged

docs(validation): record TypeScript semantic-pass benchmark numbers#47
vinicq merged 1 commit into
mainfrom
docs/ts-validation-numbers

Conversation

@vinicq
Copy link
Copy Markdown
Owner

@vinicq vinicq commented Jun 5, 2026

Records the measured TypeScript validation numbers (conclusions only; raw data and the spreadsheet stay local).

TypeScript is covered by the LLM semantic pass alone (the scanner is Python-only by design). A 20-case labeled benchmark (8 rotten / 12 sound) in Jest/Vitest idioms, run blind on a small model (Claude Haiku):

  • precision 1.00 (no false alarms on the 12 sound tests)
  • recall 0.625 overall, 1.00 on the clear-cut smells
  • F1 0.77, case attribution 4/5

The three misses are the same boundary cases as the Python run (a pure-delegation passthrough asserted through an edge mock, and a trivial single-operator formula), already tracked in #43 and #44. Cross-language reproduction supports the claim that the pass carries beyond Python.

Docs-only: README "How falsegreen is validated" and VALIDATION.md.

TypeScript is LLM-only (the scanner is Python-only by design). A 20-case labeled
benchmark (8 rotten / 12 sound, Jest/Vitest) run blind on a small model (Claude
Haiku) scored precision 1.00, recall 0.625 (1.00 on clear-cut smells), F1 0.77.
The three misses are the same boundary cases as the Python run (pure-delegation
passthrough, trivial one-operator formula), already tracked as open issues.
Cross-language reproduction supports the claim that the pass carries beyond
Python. Raw data and spreadsheet stay local (.handoff/, gitignored).
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 5, 2026
@vinicq vinicq merged commit 51e0c02 into main Jun 5, 2026
4 checks passed
@vinicq vinicq deleted the docs/ts-validation-numbers branch June 5, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant