perf: cache has_jump flag and pass buffer in _pack_location by P403n1x87 · Pull Request #203 · MatthieuDartiailh/bytecode

P403n1x87 · 2026-05-28T11:28:27Z

Two independent micro-optimisations on the roundtrip hot path, benchmarked together:

Cache _is_jump on BaseInstr at construction time

has_jump() is called on every instruction in ControlFlowGraph.from_bytecode, BasicBlock.append, and _StackSizeComputer.run — the profiler showed it at ~2.9% own time, spending that time on opcode in HAS_JUMP (a set lookup) on every call.

Added _is_jump: bool to BaseInstr.slots and compute it once in _set() (the canonical setter used by init and the public set() method). All fast-path constructors that bypass _set() — copy(), _from_trusted() on both BaseInstr and ConcreteInstr, and _from_opcode() on ConcreteInstr — now copy or compute the flag directly. has_jump() becomes a single slot read.

Eliminate per-call bytearray allocation in _pack_location

_assemble_locations previously collected one bytearray per location group via _push_locations -> _pack_location -> bytearray(), then joined them with b"".join(locations) at the end. Each location entry is only 2-6 bytes, so the list of small bytearrays and the final join were measurable overhead (_pack_location at ~3.9% own in the profiler).

Changed the signature of _pack_location and _push_locations to accept a shared bytearray buf and extend into it in place. _assemble_locations creates one bytearray() up-front and converts to bytes at the end -- zero intermediate allocations.

Benchmark (perf.py, Bytecode.from_code(dis).to_code(), 30 runs, p95 r/s):

	p95 (r/s)	95% CI
Baseline	188	[187, 188]
This PR	196	[195, 196]

Delta: +8 r/s (+4.3%), Mann-Whitney p~0 (significant, threshold: p<0.01 and |delta|>=2%)

Two independent micro-optimisations on the roundtrip hot path, benchmarked together: 1. Cache `_is_jump` on BaseInstr at construction time `has_jump()` is called on every instruction in ControlFlowGraph.from_bytecode, BasicBlock.append, and _StackSizeComputer.run — the profiler showed it at ~2.9% own time, spending that time on `opcode in HAS_JUMP` (a set lookup) on every call. Added `_is_jump: bool` to BaseInstr.__slots__ and compute it once in `_set()` (the canonical setter used by __init__ and the public `set()` method). All fast-path constructors that bypass `_set()` — `copy()`, `_from_trusted()` on both BaseInstr and ConcreteInstr, and `_from_opcode()` on ConcreteInstr — now copy or compute the flag directly. `has_jump()` becomes a single slot read. 2. Eliminate per-call bytearray allocation in _pack_location `_assemble_locations` previously collected one `bytearray` per location group via `_push_locations -> _pack_location -> bytearray()`, then joined them with `b"".join(locations)` at the end. Each location entry is only 2-6 bytes, so the list of small bytearrays and the final join were measurable overhead (`_pack_location` at ~3.9% own in the profiler). Changed the signature of `_pack_location` and `_push_locations` to accept a shared `bytearray buf` and extend into it in place. `_assemble_locations` creates one `bytearray()` up-front and converts to `bytes` at the end -- zero intermediate allocations. Benchmark (perf.py, Bytecode.from_code(dis).to_code(), 30 runs, p95 r/s): | | p95 (r/s) | 95% CI | |---|---|---| | Baseline | 188 | [187, 188] | | This PR | 196 | [195, 196] | Delta: +8 r/s (+4.3%), Mann-Whitney p~0 (significant, threshold: p<0.01 and |delta|>=2%)

codecov-commenter · 2026-05-28T11:30:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.45%. Comparing base (40b6bd4) to head (df38049).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #203   +/-   ##
=======================================
  Coverage   95.45%   95.45%           
=======================================
  Files           7        7           
  Lines        2132     2135    +3     
  Branches      459      459           
=======================================
+ Hits         2035     2038    +3     
  Misses         54       54           
  Partials       43       43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

P403n1x87 marked this pull request as ready for review May 28, 2026 11:30

MatthieuDartiailh approved these changes May 28, 2026

View reviewed changes

MatthieuDartiailh merged commit 39a7993 into MatthieuDartiailh:main May 28, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: cache has_jump flag and pass buffer in _pack_location#203

perf: cache has_jump flag and pass buffer in _pack_location#203
MatthieuDartiailh merged 1 commit into
MatthieuDartiailh:mainfrom
P403n1x87:perf/cached-jump-flag-pack-location-buffer

P403n1x87 commented May 28, 2026

Uh oh!

codecov-commenter commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

P403n1x87 commented May 28, 2026

Uh oh!

codecov-commenter commented May 28, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants