Skip to content

Regex Explainer #22

@ms-shashank

Description

@ms-shashank

Current file: app/src/lib/tools/regex-explainer.ts
Current model: deepseek-v3.2
Current approach: Single prompt asking for explanation, component breakdown, examples, pitfalls, and optimization. No programmatic regex parsing, no actual match testing.

Problems with current approach:

  • Example strings (what matches and what does not) are not verified - LLM may provide strings that do not actually match.
  • Component breakdown may have incorrect descriptions for complex patterns (lookaheads, backreferences).
  • No support for different regex flavors (Python, JavaScript).
  • Optimization suggestions are not validated.

Upgrade plan:

Step Agent Action
1 Regex Parser Programmatic: Parse the regex using Python re module. Extract named groups, quantifiers, anchors, lookaheads, character classes. Generate a structured AST-like representation.
2 Match Tester Programmatic: Generate candidate test strings and run them against the regex. Verify which strings match and which do not. This guarantees example accuracy.
3 Explanation Agent Receive the parsed structure and verified test results. Generate a plain-English explanation, component breakdown table, pitfall analysis, and optimization suggestions.
4 Optimization Validator Programmatic: If the LLM suggests an optimized regex, test it against the same corpus to verify functional equivalence.
  • You are free to enhance the agents stacks in the above plan layout, the above one is just for reference. You can enhance more if needed.

Model suggestions to start with:

  • Step 3: Try deepseek-v3.2 (current model, already decent at regex). Also try deepseek-r1-0528 for complex patterns with lookaheads/backreferences.
  • This tool benefits more from the programmatic steps than from model upgrades. Focus engineering effort on Steps 1, 2, and 4.

Model Selection Guidance

  • You are free to pick any model from the Oxlo catalog based on your own testing and evaluation.
  • The Models suggestions above, not mandates. Try them first, and if they do not meet the accuracy target, experiment with alternatives.

Compare against: GPT 5.3 Thinking & Claude Sonnet 4.6.

Acceptance criteria:

  • All "match" and "no match" examples must be verified programmatically (zero false examples).
  • Component breakdown must correctly describe every segment of the regex.
  • Optimized regex (if suggested) must be functionally equivalent (verified by testing against 30+ strings).
  • Overall accuracy at 80%+.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions