Skip to content

Test262 #9: CharacterClassEscape parser bug — generated tests with full Unicode ranges (12 tests) #108

@nickna

Description

@nickna

Motivation

Surfaced during the #69 RegExp rollout (commit b968a9e). 12 generated tests under test/built-ins/RegExp/CharacterClassEscapes/ consistently fail at parse time. All have the same shape: they call a buildString(...) helper from regExpUtils.js (in the Test262 harness) with a configuration that enumerates the full Unicode code-point range.

Sample failing tests

  • test/built-ins/RegExp/CharacterClassEscapes/character-class-digit-class-escape-negative-cases.js
  • character-class-digit-class-escape-positive-cases.js
  • character-class-non-digit-class-escape-{negative,positive}-cases.js
  • character-class-non-whitespace-class-escape-{negative,positive}-cases.js
  • character-class-non-word-class-escape-{negative,positive}-cases.js
  • character-class-whitespace-class-escape-{negative,positive}-cases.js
  • character-class-word-class-escape-{negative,positive}-cases.js

The body shape:

const str = buildString({
  loneCodePoints: [],
  ranges: [
    [0x00DC00, 0x00DFFF],
    [0x000000, 0x00002F],
    [0x00003A, 0x00DBFF],
    [0x00E000, 0x10FFFF]
  ]
});

Impact

12 tests across both modes. Small bucket but worth tracking — these are the only ParseErrors in built-ins/RegExp and they form a coherent group.

Likely root causes (to investigate)

Possibilities, in rough order of likelihood:

  1. includes: [regExpUtils.js] not loading correctly — the Test262 harness file isn't being assembled into the test program. Check Test262HarnessAssembler.cs.
  2. Object literal with array-of-2-element-arrays[[0x000000, 0x00002F], ...] could trip a parser ambiguity around [[ (array vs tagged template).
  3. Hex code-point range exceeding BMP0x10FFFF is outside BMP. Parser may handle code-point literals differently than codepoint values used as array elements.
  4. String size — the eventual buildString(...) produces a string of ~1.1M characters. Some interpreter path may not handle strings that large; unlikely at parse time, but worth checking if the helper inlines the result.

Suggested approach

  1. Run one failing test directly: dotnet run -- test/built-ins/RegExp/CharacterClassEscapes/character-class-digit-class-escape-negative-cases.js. Capture the actual parse-error diagnostic — it should localize the bug.
  2. Apply the targeted fix.
  3. Cross-check by running the other 11 tests (they'll likely all flip together).

Acceptance

  • All 12 ParseError tests in RegExp/CharacterClassEscapes/ advance to Pass or to a more specific bucket.
  • No regressions in the existing parser tests.

Related

Part of #69. Smallest but most narrowly-scoped of the surfaced clusters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions