Restrict char `\xNN` to ASCII range (B6) by TumCucTom · Pull Request #372 · vercel-labs/zerolang

TumCucTom · 2026-06-04T09:30:57Z

Summary

The char scanner accepted any byte value for \\xNN (0x00..0xFF), but raw non-ASCII bytes were rejected at the same site. The asymmetry let sources build with \\xFF and fail only when the user typed the byte directly.

This PR matches the raw-byte restriction in the \\x branch: parse the two hex digits, then fail with the existing "character literal must be one byte" diagnostic if the value is >= 0x80. \\x00..\\x7F still decode to those bytes; the char scanner's \\0 escape continues to map to NUL, so users can produce a NUL char either way.

Closes one item from #318 (audit finding B6, re-confirmed against main).

Test plan

conformance/native/fail/char-hex-7f.0 covers \\x80 (the first rejected value)
conformance/native/fail/char-hex-high.0 covers \\xFF
pnpm run conformance passes
compiler-metrics.mts has 0 violations (budget bumped by 12)
No C compiler warnings (verified with -Wall -Wextra -Wpedantic)

Direct backend lowering diagnostics reported a generic "direct backend local type is unsupported" message regardless of the actual target, even though the expected and help fields already named the target. Substitute the generic "direct backend " prefix with a target-specific label (COFF x64, ELF64, AArch64 Mach-O, x86_64 Mach-O, COFF AArch64, AArch64 ELF) when the message originates from IR lowering. Cover the new behavior with regression tests for the linux-musl-x64, win32-x64.exe, and darwin-x64 targets in the owned-drop-direct-backend-unsupported fixture. Co-Authored-By: Zippy AI <tomkinsbale@icloud.com>

…ring-diagnostics # Conflicts: # scripts/compiler-metrics.mts

Float literals accepted `digits '.' digits` but rejected any input containing `_`, while integer literals accept `1_000.0`-style separators. The asymmetry forces numeric groupings to be omitted in float sources and produces a confusing TYP019 for `1_000.5`. Mirror the integer path: copy the source into a stack buffer, reject leading, trailing, and double underscores, then run the existing digit shape check and `strtod` on the stripped text. The behavior stays byte-equivalent for inputs without `_`. Adds a positive conformance fixture (`float-literal-underscores.0`) and a negative one (`malformed-float-underscores.0` covers trailing underscore). Bumps the `checker.c` line budget by 20.

The string-literal decoder's catch-all branch appended the byte after `\\` verbatim. Three findings flowed from that one gap: - `"\q"` silently decoded to `"q"` (A2) - `"\0"` silently decoded to `"0"` (A3) - `"A"` silently decoded to `"u0041"` (B8) The char scanner at `canonical_text.c:220-222` already whitelists the known escapes and rejects everything else with PAR100. Mirror that policy in the string decoder: explicitly accept `n`, `r`, `t`, and `x..`; pass through the self-escaped `\\`, `'`, `"`; reject everything else with PAR100. `\\x00` continues to be rejected (a NUL byte would truncate a C string), which now matches `\\0`. Adds a positive conformance fixture (`string-escape-canonical.0`) that asserts `\\n`, `\\t`, `\\"`, `\\\\`, and `\\x41` all decode to the expected bytes, and two negative fixtures (`string-unknown-escape.0`, `string-null-escape.0`) that lock the PAR100 path. Bumps the `canonical_text_program.c` line budget by 17.

The char scanner accepted any byte value for `\\xNN` (0x00..0xFF), but raw non-ASCII bytes were rejected at the same site. The asymmetry let sources build with `\\xFF` and fail only when the user typed the byte directly. Match the raw-byte restriction in the `\\x` branch: parse the two hex digits, then fail with the existing "character literal must be one byte" diagnostic if the value is >= 0x80. `\\x00..\\x7F` still decode to those bytes (the char scanner's `\\0` escape continues to map to NUL, so users can produce a NUL char either way). Adds two negative conformance fixtures that lock the boundary: `char-hex-7f.0` covers `\\x80` (the first rejected value) and `char-hex-high.0` covers `\\xFF`. Bumps the `canonical_text.c` line budget by 12.

vercel · 2026-06-04T09:31:01Z

@TumCucTom is attempting to deploy a commit to the Vercel Labs Team on Vercel.

A member of the Team first needs to authorize it.

TumCucTom and others added 5 commits June 4, 2026 01:38

Merge remote-tracking branch 'origin/main' into fix/target-aware-lowe…

bcb293f

…ring-diagnostics # Conflicts: # scripts/compiler-metrics.mts

Merge origin/main into fix/b6-char-x-range

ce1d368

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restrict char `\xNN` to ASCII range (B6)#372

Restrict char `\xNN` to ASCII range (B6)#372
TumCucTom wants to merge 6 commits into
vercel-labs:mainfrom
TumCucTom:fix/b6-char-x-range

TumCucTom commented Jun 4, 2026

Uh oh!

vercel Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

TumCucTom commented Jun 4, 2026

Summary

Test plan

Uh oh!

vercel Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant