Skip to content

fix(merge): pin classic locale on CSV streams so integers can't get grouped#17

Merged
mledour merged 1 commit into
mainfrom
fix/csv-locale-classic
Jun 8, 2026
Merged

fix(merge): pin classic locale on CSV streams so integers can't get grouped#17
mledour merged 1 commit into
mainfrom
fix/csv-locale-classic

Conversation

@mledour

@mledour mledour commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Finding #6 (code review): CSV corruption under a non-"C" global locale

Floats are written via fmt (locale-independent), but the integer columns — the # qpc_freq= header, display_time, thread_id, and the qpc/gpu values — go through std::ostream operator<<, which honors the stream's locale. No CSV stream was ever imbued. A host that installs a thousands-grouping global locale (some middleware / Qt launchers do, via std::locale::global(std::locale(""))) would make us emit 1,234,567, injecting spurious fields and corrupting both the per-side and merged CSVs — and breaking byte-equivalence with analyze.py.

Fix

imbue(std::locale::classic()) on every CSV read/write stream:

Stream Site
per-side writer (FrameCsvSink / GpuCsvSink) layer.cpp ensureOpen
merged-CSV writer merge.cpp WriteMergedCsv
per-row parsers merge.cpp ReadFrameCsv / ReadGpuCsv stringstreams

analyze.py needs no change — Python's int()/str()/csv don't honor the C locale — so the two paths stay byte-equivalent. (<locale> added to pch.h + merge.cpp.)

Test

New test_merge.cpp case installs a grouping global locale (portable custom std::numpunct, RAII-restored so it can't leak into other cases), confirms the locale is active, then asserts WriteMergedCsv still emits raw integers (1234567,89012, present; 1,234,567 / 89,012 absent). It runs in the CI test binary (which CI executes and fails on non-zero exit).

Pure additions (+68, no deletions); no lines >120 cols.

🤖 Generated with Claude Code

…rouped

Floats are written via fmt (locale-independent), but the integer columns
(qpc_freq header, display_time, thread_id, qpc/gpu values) go through
std::ostream operator<<, which uses the stream's locale. No CSV stream was
ever imbued, so a host that installed a thousands-grouping global locale --
some middleware / Qt launchers do, via std::locale::global(std::locale("")) --
would emit "1,234,567", injecting spurious fields and corrupting both the
per-side and merged CSVs (and the byte-equivalence with analyze.py, whose
Python side is always locale-independent).

Imbue std::locale::classic() on every CSV read/write stream: the FrameCsvSink
/ GpuCsvSink writer, WriteMergedCsv's output, and the two per-row parse
stringstreams in ReadFrameCsv / ReadGpuCsv. analyze.py needs no change
(Python int()/str()/csv don't honor the C locale), so the two paths stay
byte-equivalent.

Adds a test that installs a grouping global locale (portable custom
std::numpunct, RAII-restored so it can't leak into other cases) and asserts
WriteMergedCsv still emits raw integers; it runs in the CI test binary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mledour mledour merged commit 5ce0176 into main Jun 8, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants