Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #93 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 9 9
Lines 1622 1783 +161
Branches 261 300 +39
==========================================
+ Hits 1622 1783 +161 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Greptile SummaryThis PR adds Civet-style
Confidence Score: 5/5Safe to merge; all new codegen paths are well-tested and the dynamic regex caching logic is correct. The all-const self-overwriting closure, the mixed-path source/util.civet — the "RD" decompile path silently drops escaped spaces; worth a follow-up if decompile round-trip fidelity matters for tooling built on top of Hera. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["HeregexBody parsed\n(RegExpPart[])"] --> B{All parts\nare strings?}
B -- Yes --> C["Return ['R', body.join('')]\n(static regex, compiled once)"]
B -- No --> D["Return ['RD', parts]\n(dynamic regex)"]
D --> E["compiler: defineDynamicRe(parts)\n→ $RDi id (deduped by JSON.stringify)"]
E --> F{All parts\nare const/string?}
F -- Yes --> G["All-const path\nlet $RDi = self-overwriting closure\nEvaluates once, replaces itself with $R(...)"]
F -- No --> H["Mixed path\nconst $RDi = closure\n$RDi.constN ??= String(expr)\n$RDi.source cached, rebuilt on change"]
G --> I["$EXPECT($RDi, errorFn)"]
H --> I
I --> J{defaultHandler?}
J -- Yes --> K["$R$0($EXPECT($RDi, …))\nextracts match[0]"]
J -- No --> L["$EXPECT($RDi, …)\nfull RegExpMatchArray\nto handler $0…$9"]
Reviews (2): Last reviewed commit: "Forbid empty heregex" | Re-trigger Greptile |
| HeregexSubstitutionCharacter | ||
| [^}\\]+ | ||
| EscapeSequence |
There was a problem hiding this comment.
Silent truncation on nested
{} in interpolation expressions
HeregexSubstitutionCharacter is [^}\\]+, so any unescaped } inside an interpolation expression terminates parsing of the expression. A user writing /// ${getPattern({key: val})} /// would silently have the expression truncated to getPattern({key: val, then the remainder would be parsed as heregex body content — potentially producing incorrect regex without an error. This is a sharp edge worth documenting explicitly (or enforcing with a compile-time error), since the failure is silent and the result may still be a valid regex with wrong semantics.
There was a problem hiding this comment.
I added basic brace matching, skipping over single and double quoted strings. It's probably) not perfect (unbalanced braces in comments or template strings would fail) but it's simple, language-agnostic, and hopefully enough.
| HeregexBody | ||
| !TripleSlash HeregexPart* -> $2 |
There was a problem hiding this comment.
Empty heregex
/// /// (only whitespace) works but ////// silently falls through
The !TripleSlash negative lookahead at the top of HeregexBody means the body is required to begin with something other than ///. Without any whitespace the body immediately sees ///, !TripleSlash fails, and HeregexBody does not match, causing the rule to fall through to "/" !Space $RegExpCharacter* "/" instead of being recognised as an empty heregex. In practice /// /// (with a space) does produce ["R", ""] correctly, so the restriction only surfaces for the degenerate ////// form. Still, the !TripleSlash guard appears redundant: HeregexPart* naturally stops when it reaches /// because no alternative can consume it, so the guard could be removed without changing semantics for all non-degenerate cases.
There was a problem hiding this comment.
This matches how Civet does things. To be honest, I'm not sure why it's like this, and not just a + on the Parts...
There was a problem hiding this comment.
Could probably improve it in Civet since it currently fails to parse.
There was a problem hiding this comment.
I realized why Civet forbids //////: it would transpile to // which is a comment, not a regex. So I've reproduced that behavior here. Empty regexes aren't useful in Hera (I don't think...). Hmm, but // does work in Hera (always matches)... Should I forbid both or allow both? Perhaps // should be reserved for comments?
That said, Civet has some broken edge cases: ///// opens heregex and then is a comment, which can build //. Also /// /// compiles to // which is bad (invalid JS). Hera doesn't have this issue because it uses new RegExp.
There was a problem hiding this comment.
CoffeeScript compiles to /(?:)/ which we could match.
There was a problem hiding this comment.
I enabled ////// given that // works; both produce empty regular expressions.
I also changed Hera to use JS regexp literals, including /(?:)/ for //. This makes for cleaner output IMO, though the size reduction is minimal: Civet's main.mjs goes from 1289242 to 1286867 bytes, a reduction of 2375.
|
@greptileai revise your review |
Fixes #87.
///...///heregex-style regex literals with whitespace and//comments ignored.${...}dynamic interpolation in heregexes. These can change every time the regex attempts to match!${const ...}interpolation for regex parts that should be captured once, instead of updated dynamically.$RDi.samples/heregex.hera./.../now compiles to JS literal/.../.