docs(validation): record TypeScript semantic-pass benchmark numbers by vinicq · Pull Request #47 · vinicq/falsegreen

vinicq · 2026-06-05T17:17:52Z

Records the measured TypeScript validation numbers (conclusions only; raw data and the spreadsheet stay local).

TypeScript is covered by the LLM semantic pass alone (the scanner is Python-only by design). A 20-case labeled benchmark (8 rotten / 12 sound) in Jest/Vitest idioms, run blind on a small model (Claude Haiku):

precision 1.00 (no false alarms on the 12 sound tests)
recall 0.625 overall, 1.00 on the clear-cut smells
F1 0.77, case attribution 4/5

The three misses are the same boundary cases as the Python run (a pure-delegation passthrough asserted through an edge mock, and a trivial single-operator formula), already tracked in #43 and #44. Cross-language reproduction supports the claim that the pass carries beyond Python.

Docs-only: README "How falsegreen is validated" and VALIDATION.md.

TypeScript is LLM-only (the scanner is Python-only by design). A 20-case labeled benchmark (8 rotten / 12 sound, Jest/Vitest) run blind on a small model (Claude Haiku) scored precision 1.00, recall 0.625 (1.00 on clear-cut smells), F1 0.77. The three misses are the same boundary cases as the Python run (pure-delegation passthrough, trivial one-operator formula), already tracked as open issues. Cross-language reproduction supports the claim that the pass carries beyond Python. Raw data and spreadsheet stay local (.handoff/, gitignored).

github-actions Bot added the documentation Improvements or additions to documentation label Jun 5, 2026

vinicq merged commit 51e0c02 into main Jun 5, 2026
4 checks passed

vinicq deleted the docs/ts-validation-numbers branch June 5, 2026 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(validation): record TypeScript semantic-pass benchmark numbers#47

docs(validation): record TypeScript semantic-pass benchmark numbers#47
vinicq merged 1 commit into
mainfrom
docs/ts-validation-numbers

vinicq commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vinicq commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant