Add opt-in recurrent profiling instrumentation#1318
Open
xxxkkw wants to merge 1 commit into
Open
Conversation
Expose an opt-in recurrent profiling hook so long-prefill recurrent paths can report their selected implementation path and elapsed time without changing default behavior.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MLX_LM_PROFILE_RECURRENT.Environment
Real-weight benchmark
Model:
mlx-community/mamba2-130m-4bitwith a config override forintermediate_size=1536, matching the model config'sexpand * hidden_size.Protocol: 100K input tokens and 10K generated tokens,
prefill_step_size=2048, greedy generation. This model was used because larger normal recurrent/hybrid weights were not practical on this machine:mlx-community/falcon-mamba-7b-4bitfailed a 2K smoke with a Metal resource-limit error, andmlx-community/AI21-Jamba-Reasoning-3B-4bitrequired ~111s TTFT for only 2K input.MLX_LM_PROFILE_RECURRENT=1Profiling-enabled run emitted 241,200 JSON records: 1,176
ssm_attnrecords and 240,024metal_steprecords. The sum of recorded recurrent elapsed time was 105.68s. This PR does not claim an inference speedup; it provides an opt-in profiling mode, and the long run shows that default-disabled behavior avoids profiling overhead while enabled profiling captures detailed path/timing data.Test plan
python -m unittest discover -s tests -p test_recurrent_profile.pyMLX_LM_PROFILE_RECURRENT=1gated-delta smoke test