Skip to content

perf: cache has_jump flag and pass buffer in _pack_location#203

Merged
MatthieuDartiailh merged 1 commit into
MatthieuDartiailh:mainfrom
P403n1x87:perf/cached-jump-flag-pack-location-buffer
May 28, 2026
Merged

perf: cache has_jump flag and pass buffer in _pack_location#203
MatthieuDartiailh merged 1 commit into
MatthieuDartiailh:mainfrom
P403n1x87:perf/cached-jump-flag-pack-location-buffer

Conversation

@P403n1x87

Copy link
Copy Markdown
Contributor

Two independent micro-optimisations on the roundtrip hot path, benchmarked together:

  1. Cache _is_jump on BaseInstr at construction time

has_jump() is called on every instruction in ControlFlowGraph.from_bytecode, BasicBlock.append, and _StackSizeComputer.run — the profiler showed it at ~2.9% own time, spending that time on opcode in HAS_JUMP (a set lookup) on every call.

Added _is_jump: bool to BaseInstr.slots and compute it once in _set() (the canonical setter used by init and the public set() method). All fast-path constructors that bypass _set()copy(), _from_trusted() on both BaseInstr and ConcreteInstr, and _from_opcode() on ConcreteInstr — now copy or compute the flag directly. has_jump() becomes a single slot read.

  1. Eliminate per-call bytearray allocation in _pack_location

_assemble_locations previously collected one bytearray per location group via _push_locations -> _pack_location -> bytearray(), then joined them with b"".join(locations) at the end. Each location entry is only 2-6 bytes, so the list of small bytearrays and the final join were measurable overhead (_pack_location at ~3.9% own in the profiler).

Changed the signature of _pack_location and _push_locations to accept a shared bytearray buf and extend into it in place. _assemble_locations creates one bytearray() up-front and converts to bytes at the end -- zero intermediate allocations.

Benchmark (perf.py, Bytecode.from_code(dis).to_code(), 30 runs, p95 r/s):

p95 (r/s) 95% CI
Baseline 188 [187, 188]
This PR 196 [195, 196]

Delta: +8 r/s (+4.3%), Mann-Whitney p~0 (significant, threshold: p<0.01 and |delta|>=2%)

Two independent micro-optimisations on the roundtrip hot path, benchmarked
together:

1. Cache `_is_jump` on BaseInstr at construction time

`has_jump()` is called on every instruction in ControlFlowGraph.from_bytecode,
BasicBlock.append, and _StackSizeComputer.run — the profiler showed it at
~2.9% own time, spending that time on `opcode in HAS_JUMP` (a set lookup) on
every call.

Added `_is_jump: bool` to BaseInstr.__slots__ and compute it once in `_set()`
(the canonical setter used by __init__ and the public `set()` method).  All
fast-path constructors that bypass `_set()` — `copy()`, `_from_trusted()` on
both BaseInstr and ConcreteInstr, and `_from_opcode()` on ConcreteInstr — now
copy or compute the flag directly. `has_jump()` becomes a single slot read.

2. Eliminate per-call bytearray allocation in _pack_location

`_assemble_locations` previously collected one `bytearray` per location group
via `_push_locations -> _pack_location -> bytearray()`, then joined them with
`b"".join(locations)` at the end.  Each location entry is only 2-6 bytes, so
the list of small bytearrays and the final join were measurable overhead
(`_pack_location` at ~3.9% own in the profiler).

Changed the signature of `_pack_location` and `_push_locations` to accept a
shared `bytearray buf` and extend into it in place.  `_assemble_locations`
creates one `bytearray()` up-front and converts to `bytes` at the end -- zero
intermediate allocations.

Benchmark (perf.py, Bytecode.from_code(dis).to_code(), 30 runs, p95 r/s):

| | p95 (r/s) | 95% CI |
|---|---|---|
| Baseline | 188 | [187, 188] |
| This PR | 196 | [195, 196] |

Delta: +8 r/s (+4.3%), Mann-Whitney p~0 (significant, threshold: p<0.01 and |delta|>=2%)
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.45%. Comparing base (40b6bd4) to head (df38049).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #203   +/-   ##
=======================================
  Coverage   95.45%   95.45%           
=======================================
  Files           7        7           
  Lines        2132     2135    +3     
  Branches      459      459           
=======================================
+ Hits         2035     2038    +3     
  Misses         54       54           
  Partials       43       43           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@P403n1x87 P403n1x87 marked this pull request as ready for review May 28, 2026 11:30
@MatthieuDartiailh MatthieuDartiailh merged commit 39a7993 into MatthieuDartiailh:main May 28, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants