feat(server): add mixed-backend DFlash disk prefix cache for target layer split by weicj · Pull Request #352 · Luce-Org/lucebox-hub

weicj · 2026-06-08T07:26:13Z

Summary

This PR is the mixed-backend follow-up to #325. #325 restored disk prefix cache for same-backend target layer split; this PR adds the missing remote-shard snapshot export/import path for CUDA/HIP mixed-backend target split. With this change, placements such as --target-devices cuda:0,hip:0,hip:1 can be used together with --kv-cache-dir, and a restarted server can restore the split target state from disk prefix cache.

Previously, mixed-backend target split could not safely enable disk cache because the target prefix state does not live only in the parent process. The parent can write its local shard into the disk snapshot, but the remote HIP/CUDA shards live inside the backend IPC daemon. Without remote shard snapshot IPC, disk restore could only recover the parent-local part and would leave the split target state incomplete.

This PR makes remote shard snapshots an explicit IPC operation. On save, the parent asks the remote target shard daemon to export its snapshot tensors, then writes them into the same disk snapshot as the local shard tensors, snap_prefill_logits, and the DFlash feature mirror. On load, the parent splits remote shard tensors back out of the disk snapshot and imports them into the remote daemon. DFlash target restore and feature-mirror restore stay in the same snapshot, so speculative decode can continue after disk restore.

Changes

Adds Qwen35 remote target shard snapshot IPC:
- adds prefix_snapshot_export to export shard-local snapshot tensors and logits from the remote target shard daemon;
- adds prefix_snapshot_import to rebuild disk-loaded remote tensors into a remote daemon prefix snapshot slot;
- includes shard id, tensor name, dtype, shape, and payload size in each snapshot tensor header, with import-time validation that shape/type match the payload size.
Extends the Qwen35 layer-split disk snapshot:
- same-backend still stores local ls<shard>_<tensor-name> tensors;
- mixed-backend additionally stores remote shard tensors after the local shard index range;
- restore adopts local shards back into the parent adapter and imports remote shard tensors back into the backend IPC daemon;
- keeps snap_prefill_logits, dflash_feature_meta, and dflash_feature_data, so DFlash disk restore does not degrade into a target-only partial state.
Updates server placement validation to allow --kv-cache-dir with mixed-backend --target-devices; remote target shard IPC still has to be provided explicitly.

Notes

Local runtime validation passed on Tesla P4 CUDA + dual Pro VII HIP with Qwen3.6-27B Q4 target, the Qwen3.6 DFlash draft, and --target-devices cuda:0,hip:0,hip:1 --target-layer-split 0.08,0.46,0.46. The first server process saved disk prefix cache; after a cold restart, the second process hit disk cache and logged [target-split] adopted disk snapshot slot=63 local_shards=1 remote_shards=2 pos=10, disk_hit=true, restore=true, and DFlash speculative decode with accepted draft tokens.
Remote runtime validation also passed on RTX 3090 CUDA + Radeon 8060S Strix Halo HIP/gfx1151 with Qwen3.6-27B Q4 target, the Qwen3.6 DFlash draft, and --target-devices cuda:0,hip:0 --target-layer-split 0.5,0.5. The second cold start scanned the disk cache and logged [target-split] adopted disk snapshot slot=63 local_shards=1 remote_shards=1 pos=10, disk_hit=true, restore=true; DFlash continued speculative decode with accepted draft tokens after restore.

cubic-dev-ai

2 issues found across 5 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai

1 issue found

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

… disk prefix cache for target layer split

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

feat(server): add mixed-backend DFlash disk prefix cache

6a5a84a

weicj force-pushed the feat-mixed-backend-disk-prefix-cache branch from 3309ee5 to 6a5a84a Compare June 8, 2026 07:32

weicj marked this pull request as ready for review June 8, 2026 10:30

cubic-dev-ai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread server/src/qwen35/qwen35_target_shard_ipc.cpp

Comment thread server/src/qwen35/qwen35_target_shard_ipc_daemon.cpp

fix(server): handshake target shard snapshot import

2c36d02

cubic-dev-ai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread server/src/qwen35/qwen35_target_shard_ipc_daemon.cpp Outdated

weicj added 2 commits June 8, 2026 19:31

fix(server): preserve failed snapshot import status

8031145

fix(server): drain invalid snapshot import headers

4e22769

weicj force-pushed the feat-mixed-backend-disk-prefix-cache branch from 42c2ff1 to 4e22769 Compare June 8, 2026 16:44

easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 8, 2026

merge: include PR Luce-Org#352 feat(server): add mixed-backend DFlash…

78834be

… disk prefix cache for target layer split

fix(server): validate remote disk snapshot shards

11aba37

cubic-dev-ai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread server/src/qwen35/qwen35_layer_split_adapter.cpp Outdated

fix(server): bound remote disk snapshot shard names

ef4f724

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): add mixed-backend DFlash disk prefix cache for target layer split#352

feat(server): add mixed-backend DFlash disk prefix cache for target layer split#352
weicj wants to merge 6 commits into
Luce-Org:mainfrom
weicj:feat-mixed-backend-disk-prefix-cache

weicj commented Jun 8, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

weicj commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weicj commented Jun 8, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading