Skip to content

regex: escaped hyphen \- as a literal in a character class is rejected ("invalid pattern") when flanked by other members #4425

@proggeramlug

Description

@proggeramlug

Summary

Perry's regex engine rejects an escaped hyphen (\-) used as a literal inside a character class when it is flanked by other class members — e.g. /[a\- ]/ throws Invalid regular expression … invalid pattern at runtime (when the RegExp is constructed). \- is a legal way to write a literal hyphen in a JS character class, so this should compile.

This is the next blocker after #4362 (the marked codegen duplicate-symbol bug, now fixed — thanks!). With #4362 resolved, a Hono+JSX site that depends on marked now compiles and links into a native binary, but the binary crashes on boot because marked's GFM table-delimiter regex contains exactly this construct.

Environment

perry 0.5.1120
OS macOS (Darwin 25.5.0), arm64
clang Apple clang 21.0.0
(real-world) marked 18.0.4

Minimal repro

rx.ts:

for (const s of ['[\\-]', '[a\\- ]', '[a-z]', '[:]', '[ ]', '[a-z ]']) {
  try { new RegExp(s); console.log('OK  ', '/' + s + '/'); }
  catch (e) { console.log('FAIL', '/' + s + '/', '->', (e as Error).message); }
}
perry compile rx.ts -o /tmp/rx && /tmp/rx

Actual output

OK   /[\-]/
FAIL /[a\- ]/ -> Invalid regular expression: /[a\- ]/: invalid pattern
OK   /[a-z]/
OK   /[:]/
OK   /[ ]/
OK   /[a-z ]/

So:

  • [\-] (escaped hyphen as the only member) → OK
  • [a\- ] (escaped hyphen between other members) → FAIL
  • [a-z] (real range) → OK

The engine appears to mis-handle \- when it isn't the sole class member — likely treating the escaped hyphen as a range operator (and then failing on the "range") instead of as a literal hyphen.

Expected

\- inside a character class is always a literal hyphen, regardless of position. /[a\- ]/, /[:\- ]/, etc. should construct successfully (matching a, -, space).

Real-world impact

marked (markdown parser, extremely common transitive dep — here via a renderMarkdown helper) defines:

/ {0,3}\|?(?:[:\- ]*\|)+[\:\- ]*\n/

This regex is built at module-init, so the compiled binary throws SyntaxError: Invalid regular expression … invalid pattern and exits before serving any request. After #4362, this is the only remaining blocker to a working native build of the site — codegen + link both succeed; it dies at startup on this regex.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions