feat(server): add mixed-backend DFlash disk prefix cache for target layer split#352
Open
weicj wants to merge 6 commits into
Open
feat(server): add mixed-backend DFlash disk prefix cache for target layer split#352weicj wants to merge 6 commits into
weicj wants to merge 6 commits into
Conversation
3309ee5 to
6a5a84a
Compare
Contributor
There was a problem hiding this comment.
2 issues found across 5 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
Contributor
There was a problem hiding this comment.
1 issue found
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
42c2ff1 to
4e22769
Compare
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 8, 2026
… disk prefix cache for target layer split
Contributor
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR is the mixed-backend follow-up to #325. #325 restored disk prefix cache for same-backend target layer split; this PR adds the missing remote-shard snapshot export/import path for CUDA/HIP mixed-backend target split. With this change, placements such as
--target-devices cuda:0,hip:0,hip:1can be used together with--kv-cache-dir, and a restarted server can restore the split target state from disk prefix cache.Previously, mixed-backend target split could not safely enable disk cache because the target prefix state does not live only in the parent process. The parent can write its local shard into the disk snapshot, but the remote HIP/CUDA shards live inside the backend IPC daemon. Without remote shard snapshot IPC, disk restore could only recover the parent-local part and would leave the split target state incomplete.
This PR makes remote shard snapshots an explicit IPC operation. On save, the parent asks the remote target shard daemon to export its snapshot tensors, then writes them into the same disk snapshot as the local shard tensors,
snap_prefill_logits, and the DFlash feature mirror. On load, the parent splits remote shard tensors back out of the disk snapshot and imports them into the remote daemon. DFlash target restore and feature-mirror restore stay in the same snapshot, so speculative decode can continue after disk restore.Changes
prefix_snapshot_exportto export shard-local snapshot tensors and logits from the remote target shard daemon;prefix_snapshot_importto rebuild disk-loaded remote tensors into a remote daemon prefix snapshot slot;ls<shard>_<tensor-name>tensors;snap_prefill_logits,dflash_feature_meta, anddflash_feature_data, so DFlash disk restore does not degrade into a target-only partial state.--kv-cache-dirwith mixed-backend--target-devices; remote target shard IPC still has to be provided explicitly.Notes
--target-devices cuda:0,hip:0,hip:1 --target-layer-split 0.08,0.46,0.46. The first server process saved disk prefix cache; after a cold restart, the second process hit disk cache and logged[target-split] adopted disk snapshot slot=63 local_shards=1 remote_shards=2 pos=10,disk_hit=true,restore=true, and DFlash speculative decode with accepted draft tokens.--target-devices cuda:0,hip:0 --target-layer-split 0.5,0.5. The second cold start scanned the disk cache and logged[target-split] adopted disk snapshot slot=63 local_shards=1 remote_shards=1 pos=10,disk_hit=true,restore=true; DFlash continued speculative decode with accepted draft tokens after restore.