[codex] restore CPU_SIM fallback storage for split TPOP tiles#59
[codex] restore CPU_SIM fallback storage for split TPOP tiles#59
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the Tile class in include/pto/common/pto_tile.hpp by introducing a private ensureStorage method to handle lazy initialization of the internal buffer, which is now used by both non-const and newly added const data() accessors. The internal storage members are marked mutable to support this. Additionally, the preprocessor conditions for these features were simplified, and a new test case was added to tests/cpu/st/testcase/tpushpop/main.cpp. A review comment points out that ensureStorage lacks thread safety, which could lead to race conditions if Tile instances are shared between threads during CPU simulation.
| AICORE void ensureStorage() const | ||
| { | ||
| if (!data_) { | ||
| internalBuffer.resize(Rows * Cols); | ||
| data_ = internalBuffer.data(); | ||
| } | ||
| } |
There was a problem hiding this comment.
The ensureStorage method is not thread-safe. In a multi-threaded CPU simulation environment, if multiple threads access the same Tile instance and call data() for the first time simultaneously, a race condition could occur during the internalBuffer.resize and data_ assignment. While Tile objects are often local to a subblock, if they are ever shared across threads in __CPU_SIM, this could lead to undefined behavior or crashes.
Summary
Tileobjects even when__PTO_AUTO__is not definedTASSIGNredirection intact while addingconst/non-const accessors that allocate only when no backing pointer has been assignedtpushpopCPU regression test that exercises splitTPOPinto unassigned destination tiles across dual subblocksRoot cause
The regression in issue #50 came from
include/pto/common/pto_tile.hppafterd940c05b: CPU_SIM tiles only kept internal lazy storage when both__CPU_SIMand__PTO_AUTO__were defined. In normal CPU_SIM builds used bya5sim, splitTPOPconsumers could therefore keep a null backing pointer unless the kernel explicitly calledTASSIGN.The failing BGEMM path in
simplerdeclares a splitTPOPdestination tile withoutTASSIGN. The CPU path reachesTPop.hpp -> CopyLinearToTile, which writes throughdst.data()[...], so the null tile backing pointer becomes a segmentation fault during execution.This patch restores the earlier CPU_SIM fallback-storage behavior at the
Tilelayer, which is the correct upstream fix. Downstream kernels no longer need to addTASSIGNjust to avoid the regression.Validation
python3 tests/run_cpu.py --testcase tpushpop --gtest_filter 'TPushPopTest.a5_style_c2v_dual_subblock_split_push_pop_without_tassign_dst' --generator Ninja --build-dir build/cpu_tpushpop_regression --cleanpython3 tests/run_cpu.py --testcase tpushpop --generator Ninja --build-dir build/cpu_tpushpop_full --cleanCC=/opt/homebrew/bin/gcc-15 CXX=/opt/homebrew/bin/g++-15 PTO_ISA_ROOT=/Users/zhoubot/github/pto-isa /Users/zhoubot/github/pto-orgs/simpler/.venv313/bin/python examples/scripts/run_example.py --build -k examples/a5/tensormap_and_ringbuffer/bgemm/kernels -g examples/a5/tensormap_and_ringbuffer/bgemm/golden.py -p a5sim --log-level warnFixes #50