Summary
Perry's regex engine rejects an escaped hyphen (\-) used as a literal inside a character class when it is flanked by other class members — e.g. /[a\- ]/ throws Invalid regular expression … invalid pattern at runtime (when the RegExp is constructed). \- is a legal way to write a literal hyphen in a JS character class, so this should compile.
This is the next blocker after #4362 (the marked codegen duplicate-symbol bug, now fixed — thanks!). With #4362 resolved, a Hono+JSX site that depends on marked now compiles and links into a native binary, but the binary crashes on boot because marked's GFM table-delimiter regex contains exactly this construct.
Environment
|
|
| perry |
0.5.1120 |
| OS |
macOS (Darwin 25.5.0), arm64 |
| clang |
Apple clang 21.0.0 |
| (real-world) marked |
18.0.4 |
Minimal repro
rx.ts:
for (const s of ['[\\-]', '[a\\- ]', '[a-z]', '[:]', '[ ]', '[a-z ]']) {
try { new RegExp(s); console.log('OK ', '/' + s + '/'); }
catch (e) { console.log('FAIL', '/' + s + '/', '->', (e as Error).message); }
}
perry compile rx.ts -o /tmp/rx && /tmp/rx
Actual output
OK /[\-]/
FAIL /[a\- ]/ -> Invalid regular expression: /[a\- ]/: invalid pattern
OK /[a-z]/
OK /[:]/
OK /[ ]/
OK /[a-z ]/
So:
[\-] (escaped hyphen as the only member) → OK
[a\- ] (escaped hyphen between other members) → FAIL
[a-z] (real range) → OK
The engine appears to mis-handle \- when it isn't the sole class member — likely treating the escaped hyphen as a range operator (and then failing on the "range") instead of as a literal hyphen.
Expected
\- inside a character class is always a literal hyphen, regardless of position. /[a\- ]/, /[:\- ]/, etc. should construct successfully (matching a, -, space).
Real-world impact
marked (markdown parser, extremely common transitive dep — here via a renderMarkdown helper) defines:
/ {0,3}\|?(?:[:\- ]*\|)+[\:\- ]*\n/
This regex is built at module-init, so the compiled binary throws SyntaxError: Invalid regular expression … invalid pattern and exits before serving any request. After #4362, this is the only remaining blocker to a working native build of the site — codegen + link both succeed; it dies at startup on this regex.
Summary
Perry's regex engine rejects an escaped hyphen (
\-) used as a literal inside a character class when it is flanked by other class members — e.g./[a\- ]/throwsInvalid regular expression … invalid patternat runtime (when theRegExpis constructed).\-is a legal way to write a literal hyphen in a JS character class, so this should compile.This is the next blocker after #4362 (the
markedcodegen duplicate-symbol bug, now fixed — thanks!). With #4362 resolved, a Hono+JSX site that depends onmarkednow compiles and links into a native binary, but the binary crashes on boot becausemarked's GFM table-delimiter regex contains exactly this construct.Environment
0.5.1120arm6418.0.4Minimal repro
rx.ts:perry compile rx.ts -o /tmp/rx && /tmp/rxActual output
So:
[\-](escaped hyphen as the only member) → OK[a\- ](escaped hyphen between other members) → FAIL[a-z](real range) → OKThe engine appears to mis-handle
\-when it isn't the sole class member — likely treating the escaped hyphen as a range operator (and then failing on the "range") instead of as a literal hyphen.Expected
\-inside a character class is always a literal hyphen, regardless of position./[a\- ]/,/[:\- ]/, etc. should construct successfully (matchinga,-, space).Real-world impact
marked(markdown parser, extremely common transitive dep — here via arenderMarkdownhelper) defines:This regex is built at module-init, so the compiled binary throws
SyntaxError: Invalid regular expression … invalid patternand exits before serving any request. After #4362, this is the only remaining blocker to a working native build of the site — codegen + link both succeed; it dies at startup on this regex.