test(feed-engine): parser robustness / fuzz suite + regexExtract guard#164
Merged
Conversation
The engine is the parse seam for untrusted upstream feed bytes. This locks its hostile-input contract (353 generated cases, ~0.5s, no infra): - extractRecords (text/csv) never throws on any bytes (empty, binary, unicode, 200KB, unterminated quotes, ragged rows). - applyTransforms is now TOTAL — no op throws on any input (14 ops × 22 inputs). - runEngine isolates bad records: good rows in `records`, bad in `errors`, invariant read === ok + failed; a malformed payload throws for the caller. - getPath never throws; a malicious `__proto__` CSV header can't pollute Object.prototype (verified). Fix: regexExtract guards `new RegExp(pattern)` (try/catch → undefined) so a malformed manifest pattern yields "no match" instead of throwing — makes applyTransforms total. Parity suites unaffected (381 engine tests green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second cross-cutting hardening item. The declarative feed-engine is the parse seam for untrusted upstream feed bytes (ThreatFox, CISA, OpenPhish today; the template every new feed follows). This locks its hostile-input contract.
Suite (353 generated cases, ~0.5s, no infra)
extractRecords(text/csv) never throws on any bytes — empty, binary, unicode, 200 KB, unterminated quotes, ragged rows, comment-only.applyTransformsis now TOTAL — no op throws on any input (14 ops × 22 hostile inputs + a 200-iteration random-chain fuzz).runEngineisolates bad records — good rows →records, bad →errors, invariantread === ok + failed; a malformed payload throws for the caller (engineHandler) to catch as a feed failure.getPathnever throws, and a malicious__proto__CSV header can't polluteObject.prototype(verified).One fix
regexExtractnow guardsnew RegExp(pattern)(try/catch →undefined) so a malformed manifest pattern yields "no match" instead of throwing — makingapplyTransformsa total function. Previously a bad pattern threw (caught per-record byrunEngine, but every record using it would fail).Parity suites unaffected — 381 engine tests green, gateway
tscclean.