Skip to content

Restrict char \xNN to ASCII range (B6)#372

Open
TumCucTom wants to merge 6 commits into
vercel-labs:mainfrom
TumCucTom:fix/b6-char-x-range
Open

Restrict char \xNN to ASCII range (B6)#372
TumCucTom wants to merge 6 commits into
vercel-labs:mainfrom
TumCucTom:fix/b6-char-x-range

Conversation

@TumCucTom

Copy link
Copy Markdown

Summary

The char scanner accepted any byte value for \\xNN (0x00..0xFF), but raw non-ASCII bytes were rejected at the same site. The asymmetry let sources build with \\xFF and fail only when the user typed the byte directly.

This PR matches the raw-byte restriction in the \\x branch: parse the two hex digits, then fail with the existing "character literal must be one byte" diagnostic if the value is >= 0x80. \\x00..\\x7F still decode to those bytes; the char scanner's \\0 escape continues to map to NUL, so users can produce a NUL char either way.

Closes one item from #318 (audit finding B6, re-confirmed against main).

Test plan

  • conformance/native/fail/char-hex-7f.0 covers \\x80 (the first rejected value)
  • conformance/native/fail/char-hex-high.0 covers \\xFF
  • pnpm run conformance passes
  • compiler-metrics.mts has 0 violations (budget bumped by 12)
  • No C compiler warnings (verified with -Wall -Wextra -Wpedantic)

TumCucTom and others added 5 commits June 4, 2026 01:38
Direct backend lowering diagnostics reported a generic "direct backend
local type is unsupported" message regardless of the actual target,
even though the expected and help fields already named the target.
Substitute the generic "direct backend " prefix with a target-specific
label (COFF x64, ELF64, AArch64 Mach-O, x86_64 Mach-O, COFF AArch64,
AArch64 ELF) when the message originates from IR lowering.

Cover the new behavior with regression tests for the linux-musl-x64,
win32-x64.exe, and darwin-x64 targets in the
owned-drop-direct-backend-unsupported fixture.

Co-Authored-By: Zippy AI <tomkinsbale@icloud.com>
…ring-diagnostics

# Conflicts:
#	scripts/compiler-metrics.mts
Float literals accepted `digits '.' digits` but rejected any input
containing `_`, while integer literals accept `1_000.0`-style separators.
The asymmetry forces numeric groupings to be omitted in float sources and
produces a confusing TYP019 for `1_000.5`.

Mirror the integer path: copy the source into a stack buffer, reject
leading, trailing, and double underscores, then run the existing digit
shape check and `strtod` on the stripped text. The behavior stays
byte-equivalent for inputs without `_`.

Adds a positive conformance fixture (`float-literal-underscores.0`) and
a negative one (`malformed-float-underscores.0` covers trailing
underscore). Bumps the `checker.c` line budget by 20.
The string-literal decoder's catch-all branch appended the byte after
`\\` verbatim. Three findings flowed from that one gap:

- `"\q"` silently decoded to `"q"` (A2)
- `"\0"` silently decoded to `"0"` (A3)
- `"A"` silently decoded to `"u0041"` (B8)

The char scanner at `canonical_text.c:220-222` already whitelists the
known escapes and rejects everything else with PAR100. Mirror that
policy in the string decoder: explicitly accept `n`, `r`, `t`, and
`x..`; pass through the self-escaped `\\`, `'`, `"`; reject everything
else with PAR100. `\\x00` continues to be rejected (a NUL byte would
truncate a C string), which now matches `\\0`.

Adds a positive conformance fixture (`string-escape-canonical.0`) that
asserts `\\n`, `\\t`, `\\"`, `\\\\`, and `\\x41` all decode to the
expected bytes, and two negative fixtures
(`string-unknown-escape.0`, `string-null-escape.0`) that lock the
PAR100 path. Bumps the `canonical_text_program.c` line budget by 17.
The char scanner accepted any byte value for `\\xNN` (0x00..0xFF), but
raw non-ASCII bytes were rejected at the same site. The asymmetry let
sources build with `\\xFF` and fail only when the user typed the byte
directly.

Match the raw-byte restriction in the `\\x` branch: parse the two hex
digits, then fail with the existing "character literal must be one
byte" diagnostic if the value is >= 0x80. `\\x00..\\x7F` still decode
to those bytes (the char scanner's `\\0` escape continues to map to
NUL, so users can produce a NUL char either way).

Adds two negative conformance fixtures that lock the boundary:
`char-hex-7f.0` covers `\\x80` (the first rejected value) and
`char-hex-high.0` covers `\\xFF`. Bumps the `canonical_text.c` line
budget by 12.
@vercel

vercel Bot commented Jun 4, 2026

Copy link
Copy Markdown

@TumCucTom is attempting to deploy a commit to the Vercel Labs Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant