Restrict char \xNN to ASCII range (B6)#372
Open
TumCucTom wants to merge 6 commits into
Open
Conversation
Direct backend lowering diagnostics reported a generic "direct backend local type is unsupported" message regardless of the actual target, even though the expected and help fields already named the target. Substitute the generic "direct backend " prefix with a target-specific label (COFF x64, ELF64, AArch64 Mach-O, x86_64 Mach-O, COFF AArch64, AArch64 ELF) when the message originates from IR lowering. Cover the new behavior with regression tests for the linux-musl-x64, win32-x64.exe, and darwin-x64 targets in the owned-drop-direct-backend-unsupported fixture. Co-Authored-By: Zippy AI <tomkinsbale@icloud.com>
…ring-diagnostics # Conflicts: # scripts/compiler-metrics.mts
Float literals accepted `digits '.' digits` but rejected any input containing `_`, while integer literals accept `1_000.0`-style separators. The asymmetry forces numeric groupings to be omitted in float sources and produces a confusing TYP019 for `1_000.5`. Mirror the integer path: copy the source into a stack buffer, reject leading, trailing, and double underscores, then run the existing digit shape check and `strtod` on the stripped text. The behavior stays byte-equivalent for inputs without `_`. Adds a positive conformance fixture (`float-literal-underscores.0`) and a negative one (`malformed-float-underscores.0` covers trailing underscore). Bumps the `checker.c` line budget by 20.
The string-literal decoder's catch-all branch appended the byte after `\\` verbatim. Three findings flowed from that one gap: - `"\q"` silently decoded to `"q"` (A2) - `"\0"` silently decoded to `"0"` (A3) - `"A"` silently decoded to `"u0041"` (B8) The char scanner at `canonical_text.c:220-222` already whitelists the known escapes and rejects everything else with PAR100. Mirror that policy in the string decoder: explicitly accept `n`, `r`, `t`, and `x..`; pass through the self-escaped `\\`, `'`, `"`; reject everything else with PAR100. `\\x00` continues to be rejected (a NUL byte would truncate a C string), which now matches `\\0`. Adds a positive conformance fixture (`string-escape-canonical.0`) that asserts `\\n`, `\\t`, `\\"`, `\\\\`, and `\\x41` all decode to the expected bytes, and two negative fixtures (`string-unknown-escape.0`, `string-null-escape.0`) that lock the PAR100 path. Bumps the `canonical_text_program.c` line budget by 17.
The char scanner accepted any byte value for `\\xNN` (0x00..0xFF), but raw non-ASCII bytes were rejected at the same site. The asymmetry let sources build with `\\xFF` and fail only when the user typed the byte directly. Match the raw-byte restriction in the `\\x` branch: parse the two hex digits, then fail with the existing "character literal must be one byte" diagnostic if the value is >= 0x80. `\\x00..\\x7F` still decode to those bytes (the char scanner's `\\0` escape continues to map to NUL, so users can produce a NUL char either way). Adds two negative conformance fixtures that lock the boundary: `char-hex-7f.0` covers `\\x80` (the first rejected value) and `char-hex-high.0` covers `\\xFF`. Bumps the `canonical_text.c` line budget by 12.
|
@TumCucTom is attempting to deploy a commit to the Vercel Labs Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The char scanner accepted any byte value for
\\xNN(0x00..0xFF), but raw non-ASCII bytes were rejected at the same site. The asymmetry let sources build with\\xFFand fail only when the user typed the byte directly.This PR matches the raw-byte restriction in the
\\xbranch: parse the two hex digits, then fail with the existing "character literal must be one byte" diagnostic if the value is>= 0x80.\\x00..\\x7Fstill decode to those bytes; the char scanner's\\0escape continues to map to NUL, so users can produce a NUL char either way.Closes one item from #318 (audit finding B6, re-confirmed against
main).Test plan
conformance/native/fail/char-hex-7f.0covers\\x80(the first rejected value)conformance/native/fail/char-hex-high.0covers\\xFFpnpm run conformancepassescompiler-metrics.mtshas 0 violations (budget bumped by 12)-Wall -Wextra -Wpedantic)