[feat] rollout indexer replay support by yueming-yuan · Pull Request #1183 · radixark/miles

yueming-yuan · 2026-05-22T21:56:11Z

Summary

Add a generic IndexerReplayManager and sequential replay registration for indexer top-k streams.
Thread indexer_topk through SGLang rollout/session/OpenAI response handling and training data plumbing.
Add generic rollout indexer replay shape args without DeepSeek-V4-specific fallbacks.

Tests

python -m compileall miles tests/fast/rollout/generate_utils/test_indexer_replay.py tests/fast/backends/megatron_utils/test_replay_utils.py
uvx ruff check ... on touched files
uvx black --check ... on touched files
python -m pytest --confcutdir=tests/fast/rollout/generate_utils tests/fast/rollout/generate_utils/test_indexer_replay.py tests/fast/rollout/generate_utils/test_sample_utils.py tests/fast/rollout/generate_utils/test_openai_endpoint_utils.py -k 'not test_create_fetches_session_server_instance_id'
python -m pytest --confcutdir=tests/fast/backends/megatron_utils tests/fast/backends/megatron_utils/test_replay_utils.py
python -m pytest --confcutdir=tests/fast/utils tests/fast/utils/test_types.py

gemini-code-assist

Code Review

This pull request implements indexer replay functionality, enabling the capture and re-use of indexer top-k decisions from the rollout engine during training. Changes span across CLI argument handling, data processing in Megatron and SGLang backends, and sample management utilities. Key feedback identifies a critical issue where the sequential replay registration fails under pipeline parallelism due to missing stream slicing. Additionally, the configuration for the IndexerReplayManager needs adjustments to support sequence parallelism and should avoid hardcoded thresholds by utilizing model configuration parameters.

gemini-code-assist · 2026-05-22T21:59:33Z

+    if replay_data.shape[1] != len(replay_list):
+        raise AssertionError(
+            f"replay data has {replay_data.shape[1]} streams, but {len(replay_list)} modules registered replay"
+        )


The _register_replay_list_sequential function is incompatible with pipeline parallelism (PP > 1). replay_data (sourced from SGLang) contains indexer streams for all layers in the model, whereas replay_list only contains the modules registered on the current PP rank. This discrepancy will cause the AssertionError to trigger on any rank where the number of local indexer modules does not match the total number of streams.

To fix this, you should use the pipeline stage offset to slice the replay_data streams, similar to the logic implemented in _register_replay_list_moe.

gemini-code-assist · 2026-05-22T21:59:33Z

+    if_sp_region = False
+    enable_check_replay_result = False
+    replay_check_threshold = 0.7


There are two issues in the IndexerReplayManager configuration:

if_sp_region is set to False. If indexers are used within transformer layers (which is typical for models like DeepSeek-V3/V4) and sequence parallelism is enabled, this will cause a shape mismatch crash in _get_replay_result because the scores tensor will be sliced while the replayed top_indices will not be.

replay_check_threshold is set to 0.7. This is extremely loose compared to RoutingReplayManager (0.01). A 70% allowed mismatch effectively disables the utility of the CI correctness check for indexer replay, as training behavior would likely diverge significantly with such a high mismatch rate.

Additionally, ensure these parameters are retrieved from the model configuration rather than being hardcoded to maintain consistency with repository guidelines.

Suggested change

if_sp_region = False

enable_check_replay_result = False

replay_check_threshold = 0.7

if_sp_region = config.if_sp_region

enable_check_replay_result = False

replay_check_threshold = config.replay_check_threshold

References

Model parameters, such as index_topk, should be retrieved from the model configuration rather than being hardcoded.

Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>

yueming-yuan requested review from Zhichenzzz, fzyzcjy, guapisolo, jybsuper, maocheng23 and yushengsu-thu as code owners May 22, 2026 21:56

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

yueming-yuan changed the title ~~Add rollout indexer replay support~~ [feat] rollout indexer replay support May 22, 2026

Add rollout indexer replay support

a40696d

Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>

yueming-yuan force-pushed the indexer-replay branch from 1057aad to a40696d Compare May 22, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] rollout indexer replay support#1183

[feat] rollout indexer replay support#1183
yueming-yuan wants to merge 1 commit into
radixark:mainfrom
yueming-yuan:indexer-replay

yueming-yuan commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yueming-yuan commented May 22, 2026

Summary

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant