Skip to content

Add opt-in recurrent profiling instrumentation#1318

Open
xxxkkw wants to merge 1 commit into
ml-explore:mainfrom
xxxkkw:recurrent-profile-instrumentation
Open

Add opt-in recurrent profiling instrumentation#1318
xxxkkw wants to merge 1 commit into
ml-explore:mainfrom
xxxkkw:recurrent-profile-instrumentation

Conversation

@xxxkkw
Copy link
Copy Markdown

@xxxkkw xxxkkw commented May 28, 2026

Summary

  • Add opt-in recurrent profiling instrumentation behind MLX_LM_PROFILE_RECURRENT.
  • Emit structured JSON timing records for recurrent paths including Mamba, SSM, gated-delta, and RecurrentGemma code paths.
  • Keep default behavior unchanged when the environment variable is unset.

Environment

  • macOS Darwin 25.4.0 arm64
  • Apple M1 Max, 32 GB unified memory
  • MLX / mlx-lm virtual environment on Apple Silicon

Real-weight benchmark

Model: mlx-community/mamba2-130m-4bit with a config override for intermediate_size=1536, matching the model config's expand * hidden_size.

Protocol: 100K input tokens and 10K generated tokens, prefill_step_size=2048, greedy generation. This model was used because larger normal recurrent/hybrid weights were not practical on this machine: mlx-community/falcon-mamba-7b-4bit failed a 2K smoke with a Metal resource-limit error, and mlx-community/AI21-Jamba-Reasoning-3B-4bit required ~111s TTFT for only 2K input.

Mode Input tokens Output tokens TTFT Total time Prompt+generation TPS Decode TPS after TTFT Peak memory Profile events
Profiling disabled 100,000 10,000 10.15s 39.41s 2,790.93 tok/s 341.73 tok/s 1.65 GB 0
MLX_LM_PROFILE_RECURRENT=1 100,000 10,000 11.38s 119.69s 919.04 tok/s 92.33 tok/s 0.62 GB 241,200

Profiling-enabled run emitted 241,200 JSON records: 1,176 ssm_attn records and 240,024 metal_step records. The sum of recorded recurrent elapsed time was 105.68s. This PR does not claim an inference speedup; it provides an opt-in profiling mode, and the long run shows that default-disabled behavior avoids profiling overhead while enabled profiling captures detailed path/timing data.

Test plan

  • python -m unittest discover -s tests -p test_recurrent_profile.py
  • MLX_LM_PROFILE_RECURRENT=1 gated-delta smoke test
  • Real-weight Mamba2 100K/10K profiling-off/on benchmark above

Expose an opt-in recurrent profiling hook so long-prefill recurrent paths can report their selected implementation path and elapsed time without changing default behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant