Skip to content

engine: threaded VM dispatch (guaranteed tail calls) to cut the dispatch branch-miss bottleneck #601

@bpowers

Description

@bpowers

Context

During the PR #599 VM-interpreter perf campaign on the C-LEARN model (the largest model: ~5,200 root slots, ~63k opcodes, 1000 Euler steps), a fresh perf profile showed the run is branch-mispredict-bound, not instruction-bound (IPC ~3.3). perf record -e branch-misses attributes ~68% of all branch-misses to Vm::eval_bytecode — specifically its central dispatch while pc < code.len() { match &code[pc] { ... } }, which lowers to a single data-dependent indirect branch (one jump table). With C-LEARN's ~25k-opcode flow program executed 1000×, the indirect branch's target-history working set exceeds the BTB/predictor capacity, so it mispredicts heavily (~15-18% of run cycles, est.).

Opcode fusion (PR #599) chips at this by cutting dispatch count (the 3-operand and global/const fusions dropped branch-misses ~0.95% and ~3.8% respectively), but the dispatch mechanism is the structural bottleneck.

Idea

Threaded dispatch. Instead of one central match, give each opcode handler its own continuation that dispatches the next opcode directly (each handler tail-calls the next via the opcode table), spreading the single indirect branch across many sites so the predictor can correlate per-opcode successors. This is the classic interpreter speedup (CPython 3.11+ "computed goto", LuaJIT, etc.). In Rust the portable equivalent is guaranteed tail calls via the become keyword.

Caveat / blocker

become is unstable (nightly Rust only) — adopting it is a toolchain/policy decision for the project, not a code-level change. It is neither unsafe nor assembly. Until then, the only portable lever is more superinstructions (dispatch-count reduction).

Expected impact

Potentially the largest single remaining win (it attacks the 68%-of-branch-misses bottleneck directly), but unquantified and gated on the nightly decision. Worth a spike to measure on a become-based prototype before committing.

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    engineIssues with the rust-based simulation engineenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions