Skip to content

CORE-9082#14

Open
ethan-wrasman-pkware wants to merge 2 commits into
masterfrom
ew/fix-shorthand
Open

CORE-9082#14
ethan-wrasman-pkware wants to merge 2 commits into
masterfrom
ew/fix-shorthand

Conversation

@ethan-wrasman-pkware
Copy link
Copy Markdown

@ethan-wrasman-pkware ethan-wrasman-pkware commented May 12, 2026

Add documentation for known limitations, and fix a long-standing bug (originally introduced in 2014) where shorthand classes inside a character class produced invalid generations.

[a-z\d] was rewritten by a post-pass replaceAll to [a-z[0-9]]. Brics, the underlying regex engine, does not support nested character classes — it parsed the outer [...] as the class [a-z[0-9] followed by a literal ], so every generated string ended with one or more stray ] characters.

Shorthand expansion now happens inline while normalizing the pattern, with awareness of whether the cursor is inside a character class:

  • Outside a class: \d → [0-9] (as before).
  • Inside a class: \d → 0-9 (class-body form, no nested brackets).
  • Negated shorthands (\D, \S, \W) inside a class expand to explicit complementary Unicode BMP ranges.

Also fixes a related off-by-one in [^X...] where the leading ^ was being emitted twice.

Added parameterized tests covering every shorthand in every position inside [...] (alone, with literal neighbors, with explicit ranges, in negated outer classes, under quantifiers), plus regression tests for every entry in the new LIMITATIONS.md.

ethan-wrasman-pkware and others added 2 commits May 13, 2026 08:09
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants