Skip to content

Conversation

@joshrotenberg
Copy link
Contributor

This PR refactors the raw string literal parsing to use explicit character-by-character processing instead of the current string replacement approach. This makes the escape handling logic clearer and aligns the code more directly with the JEP-12 specification.

Changes

  • Refactor consume_raw_string in lexer to process characters explicitly
  • Add documentation for raw string literals to rustdoc
  • Add comprehensive tests for escape handling edge cases

Behavior

No behavioral changes - this is a refactor for clarity:

  • \' produces a literal single quote (to avoid delimiter collision)
  • All other backslashes are preserved literally

Testing

  • All 861 compliance tests pass
  • Added comprehensive unit tests covering 25+ escape sequence patterns
  • Doc tests verify examples work correctly
  • Verified identical output to previous implementation across ~20,000 generated test cases

Refactors raw string parsing to use explicit character-by-character
processing instead of string replacement. This makes the escape
handling logic clearer and matches the JEP-12 specification more
directly.

Behavior is unchanged:
- \' produces a literal single quote
- All other backslashes are preserved literally

Adds comprehensive tests and documentation for raw string literals.

See: https://github.com/jmespath/jmespath.jep/blob/main/proposals/0012-raw-string-literals.md
// Read until closing quote, then process escapes
self.consume_inside(pos, '\'', |s| {
Ok(Literal(Rcvar::new(Variable::String(s.replace("\\'", "'")))))
// Only \' is an escape sequence - replace with literal quote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from existing behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional difference, just a little bit more explicit. The PR is mostly for docs/tests/clarity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK what about perf difference, I am going to assume this is slower than the standard replace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it was a bit slower. Just checked in an optimization that fixes that (and seems to make it a touch faster than the original).

- Add fast path check for strings without backslashes (common case)
- Remove peekable() overhead in favor of direct pattern matching
- Benchmark shows 8% improvement over original replace() approach
- 35% faster than initial char-by-char implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants