Skip to content

feat(dflash): native multi-request scheduler with batched target step#135

Open
javierpazo wants to merge 1 commit into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-multi-request-scheduler-batched-target-step
Open

feat(dflash): native multi-request scheduler with batched target step#135
javierpazo wants to merge 1 commit into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-multi-request-scheduler-batched-target-step

Conversation

@javierpazo

@javierpazo javierpazo commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Brings concurrent multi-request execution to test_dflash on a
single GPU. Internally one cohesive unit; happy to split into
four sequential PRs (A / B / C / D below) if you prefer per
CONTRIBUTING's "one concern per PR"
— let me know and I'll
re-open as a chain. I kept it bundled because the four pieces
share the same hunks of test_dflash.cpp (~+2130 lines) and
splitting cleanly would require careful hunk surgery; doing it on
request is fine.

Pieces in this PR

A. Multi TargetCache slots

  • CLI: --target-cache-slots=N (alias --cache-slots=N)
  • prefix SLOT <id> routes commands to a specific slot
  • DaemonSlotState + RAII ActiveDaemonSlot for safe switching
  • LIST_TARGET_CACHE_SLOTS for introspection
  • all slots share target/draft weights; only KV / SSM / scratch is
    per-slot
  • create_target_cache gains an n_seqs parameter so a single
    cache can be allocated batched up front

B. Tagged stream protocol (opt-in)

  • --stream-tagged emits frames [-2, request_id, token] instead
    of bare int32 tokens; sentinels -4 (CONTINUE), -1 (DONE)
  • parser recognises REQ <id> / REQUEST <id> headers
  • legacy bare-int32 streaming is unchanged when the flag is off
  • lets a client demux multiple concurrent requests over the same
    stdout

C. Native quantum scheduler

  • dispatch table for REQ/SLOT/START, SCHED_STEP,
    SCHED_DRAIN, LIST_REQUESTS
  • cursor-based fair round-robin between admitted requests
  • non-blocking reader thread admits new requests during a drain
  • PendingQuantum{slot, req, epoch, n_gen} carries the unit of
    work
  • CONTINUE / CONT resumes a slot without re-prefilling
  • REQ <id> CANCEL invalidates a request and bumps the slot
    epoch so a stale CONTINUE is rejected; RESTORE_CHAIN and
    legacy generate refuse to overwrite a slot that is owned by
    an active scheduler request

D. Fused batched target step (CUDA path)

  • new commands: SCHED_BATCH_PEEK, SCHED_BATCH_PROBE,
    SCHED_BATCH_TARGET_TAIL, SCHED_BATCH_TARGET_STEP,
    SCHED_BATCH_DRAIN
  • QwenGraphInputs gains n_seqs; build_delta_net_block
    accepts n_seqs > 1
  • target_feat is allocated as [5*hidden, target_feat_cap, n_seqs] when batched and the chain forwards capture features
    per-seq
  • rollback for partially accepted draft tokens, multi-token verify
    and parent-id propagation in the batched path are noted as
    follow-ups; today the batched step accepts the cleanest case
    and falls back to single-seq when needed

Validation

Per CONTRIBUTING ("benchmark before and after on the same hardware,
same warmup"). Single GPU1 RTX 6000 Ada (sm_89), Heretic Q4_K_M
target, Q8 GGUF or FP16 safetensors drafter, FA_WINDOW=0, KV
q4_0/q4_0:

Scenario Result
Two concurrent requests, REQ 4 START SLOT 0 quantum=2 + REQ 5 START SLOT 1 quantum=2, then SCHED_DRAIN closes both clean; slot 0 = 18.41 tok/s, slot 1 = 22.50 tok/s
Mid-drain admission of REQ 6 succeeds; CONTINUE on slot 0 resumes without re-prefill
batch_probe_compare_ok over a 2-seq probe mismatches = 0 vs the single-seq path
batch_tail_commit (2 completed pending quanta) 29.26 ms
batch_step_commit followed by SCHED_DRAIN 29.57 ms, then reverts cleanly back to the DFlash single-seq path

Methodology: warmup of 1 request before measurement; same --budget
and KV-quant settings across runs; nothing else competing on the GPU
during the measurement window.

Compatibility

  • All new behaviour is opt-in. Default invocation of test_dflash
    with no scheduler flags keeps the legacy single-request path
    byte-identical.
  • Tagged stream gated behind --stream-tagged.
  • Multi-slot gated behind --target-cache-slots=N (default N=1).
  • Batched target step reached only via the SCHED_BATCH_* command
    family; legacy SCHED_STEP keeps using the single-seq path.
  • Hot-loop diagnostic logs (sync_us / step_debug) are gated
    behind DFLASH27B_TIMING_DEBUG / DFLASH27B_STEP_DEBUG so the
    default path is unchanged.

Verification vs existing community PRs

Notes

  • Diff size warning: this branch was extracted from a working tree
    that drifted from main. If a hunk fails to apply on a fresh
    rebase or you spot anything off, ping me and I'll fix on the
    spot rather than push through.
  • Companion branches with smaller follow-ups (CMake sm_89 / BSA,
    gguf_draft_loader fallback, FP16 safetensors drafter, daemon
    scripts improvements, SWA mask wiring + contract test, PFlash
    operator notes) are sitting on
    https://github.com/javierpazo/lucebox-hub. Holding off on
    opening those until this one is in a known state.

Javier Pazó@xabicasaxabicasa@gmail.com

@javierpazo javierpazo changed the title dflash: native multi-request scheduler with batched target step feat(dflash): native multi-request scheduler with batched target step May 9, 2026
@davide221

Copy link
Copy Markdown
Contributor

Amazing contribution @javierpazo, thank you! Can you resolve the conflics?

This change brings concurrent multi-request execution to test_dflash
on a single GPU. It is internally one cohesive unit but can be split
into four conceptual pieces if a smaller review is preferred:

1. Multi TargetCache slots
   - CLI: --target-cache-slots=N (alias --cache-slots=N)
   - prefix `SLOT <id>` routes commands to a specific slot
   - DaemonSlotState + RAII ActiveDaemonSlot for safe switching
   - LIST_TARGET_CACHE_SLOTS for introspection
   - all slots share target/draft weights; only KV/SSM/scratch is
     per-slot
   - create_target_cache gains an `n_seqs` parameter so a single
     cache can be allocated batched up front

2. Tagged stream protocol (opt-in)
   - --stream-tagged emits frames `[-2, request_id, token]` instead
     of bare int32 tokens; sentinels `-4` (CONTINUE), `-1` (DONE)
   - parser recognises `REQ <id>` / `REQUEST <id>` headers
   - legacy bare-int32 streaming is unchanged when the flag is off
   - this lets a client demux multiple concurrent requests over the
     same stdout

3. Native quantum scheduler
   - dispatch table for REQ/SLOT/START, SCHED_STEP, SCHED_DRAIN,
     LIST_REQUESTS
   - cursor-based fair round-robin between admitted requests
   - non-blocking reader thread admits new requests during a drain
   - PendingQuantum{slot, req, epoch, n_gen} carries the unit of work
   - CONTINUE / CONT resumes a slot without re-prefilling
   - REQ <id> CANCEL invalidates a request and bumps the slot epoch
     so a stale CONTINUE is rejected; RESTORE_CHAIN / legacy generate
     refuse to overwrite a slot that is owned by an active scheduler
     request

4. Fused batched target step (CUDA path)
   - new commands: SCHED_BATCH_PEEK, SCHED_BATCH_PROBE,
     SCHED_BATCH_TARGET_TAIL, SCHED_BATCH_TARGET_STEP,
     SCHED_BATCH_DRAIN
   - QwenGraphInputs gains `n_seqs`; build_delta_net_block accepts
     n_seqs > 1
   - target_feat is allocated as [5*hidden, target_feat_cap, n_seqs]
     when batched and the chain forwards capture features per-seq
   - batch_probe_compare_ok smoke shows mismatches=0 vs the
     single-seq path; SCHED_BATCH_TARGET_TAIL commits two completed
     pending quanta in 29.26 ms; SCHED_BATCH_TARGET_STEP commits the
     next batched step in 29.57 ms; SCHED_BATCH_DRAIN completes
     req12/req13 with two batched steps each
   - rollback for partially accepted draft tokens, multi-token verify
     and parent-id propagation in the batched path are noted as
     follow-ups; today the batched step accepts the cleanest case
     and falls back to single-seq when needed

Validation (single GPU1 RTX 6000 Ada sm_89, Heretic Q4_K_M target +
Q8 GGUF or FP16 safetensors drafter, FA_WINDOW=0, KV q4_0/q4_0):

- Two concurrent requests:
    REQ 4 START SLOT 0 quantum=2
    REQ 5 START SLOT 1 quantum=2
    SCHED_DRAIN closes both clean.
    slot 0: 18.41 tok/s, slot 1: 22.50 tok/s
- Mid-drain admission of REQ 6 succeeds; CONTINUE on slot 0 resumes
  without re-prefill.
- batch_probe_compare_ok mismatches=0 over a 2-seq probe.
- batch_tail_commit count=2 ms=29.26.
- batch_step_commit ms=29.57 followed by SCHED_DRAIN reverts cleanly
  back to the DFlash single-seq path.

Compatibility:
- All new behaviour is opt-in. Default invocation of test_dflash
  with no scheduler flags keeps the legacy single-request path.
- Tagged stream is gated behind --stream-tagged.
- Multi-slot is gated behind --target-cache-slots=N (default N=1).
- Batched target step is reached only via the SCHED_BATCH_* command
  family; legacy SCHED_STEP keeps using the single-seq path.
- Hot-loop diagnostic logs (sync_us / step_debug) are now gated
  behind DFLASH27B_TIMING_DEBUG / DFLASH27B_STEP_DEBUG so the
  default path is unchanged.

Verification vs existing community PRs:
- No prior art in lucebox-hub for the SCHED_BATCH_* protocol or for
  a native C++ quantum scheduler with REQ/SLOT/CONTINUE/CANCEL +
  epoch hardening. Checked against PR Luce-Org#39 (CUDA graph reuse) and
  PR Luce-Org#62 (split target/draft StepGraphs); both reuse / split graphs
  but neither exposes a multi-request slot protocol.
- No upstream collision found for tagged stream framing or
  --target-cache-slots.

Happy to split this into four sequential PRs (slots / tagged stream /
quantum scheduler / batched target step) if a smaller-grained review
is preferred — let me know.

Author: Javier Pazo <xabicasa@gmail.com>
@javierpazo javierpazo force-pushed the xabicasa/dflash-multi-request-scheduler-batched-target-step branch from a28b609 to 561b0ac Compare May 10, 2026 19:23
@javierpazo

Copy link
Copy Markdown
Contributor Author

@davide221 thanks! Rebased on top of fresh main, conflicts resolved. The two collisions were tiny (a comment in internal.h::DraftLayer::is_swa and the [cfg] log line in est_dflash.cpp); merged the two log lines so both the upstream draft_swa / draft_ctx_max / arget_split_dflash flags and this PR's arget_cache_slots / stream_tagged show up. The big ~900-line block from upstream's new target-split path is preserved untouched.

easel pushed a commit to easel/lucebox-hub that referenced this pull request May 27, 2026
Record the 2026-05-27 18:17 scheduled run, including fresh PR classification and targeted worktree/Codex probes for stale PRs Luce-Org#137 and Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 28, 2026
Revalidate PRs Luce-Org#137 and Luce-Org#135 in isolated worktrees, record conflict shape and current-layout recommendations. Upstream main remains unchanged at 4f4d82e.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 28, 2026
Record a fresh PR Luce-Org#135 conflict probe and tmux-driven Codex feasibility report. The stack remains aligned with origin/main; no code changes were integrated.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 28, 2026
Refresh unattended integration manifest after fresh direct merge probes and a Codex feasibility pass for PR Luce-Org#135. No source stack changes.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-28 20:23 cron revalidation: upstream and carried PR heads remain current, draft Luce-Org#304 is excluded, fresh conflicted-PR probes were retained, and a tmux-driven Codex inspection keeps Luce-Org#135 as a designed current-layout port instead of a mechanical merge.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-28 23:28 cron pass, repeated conflict probes, and the Codex salvage assessment for PR Luce-Org#135. No source stack rewrite was needed because origin/main and carried non-draft PR heads were already current.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 00:41 cron preflight, open PR classification, repeat worktree merge probes, and fresh delegated PR Luce-Org#135 attempts. No source stack rewrite was needed because origin/main and carried mergeable PR heads were already included.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 01:00 EDT unattended integration run, fresh merge probes, and Codex feasibility output for PR Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 01:51 cron preflight, fresh PR probes, and the new Luce-Org#135 Claude/Codex delegation results. No PR stack code changes were needed because all safe non-draft heads are already ancestors of easel/auto-integration.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 02:51 cron refresh, current PR classifications, fresh conflict probes, and tmux delegation outcomes for PRs Luce-Org#237 and Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 03:22 cron refresh, fresh conflict probes for remaining non-draft PRs, and a tmux-driven Codex feasibility review for PR Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 06:29 unattended refresh, including fresh conflict probes for the remaining non-draft selective-port candidates and an inconclusive Luce-Org#135 delegated review attempt.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 07:22 EDT unattended refresh, including current PR containment, fresh conflict probes, and the Codex feasibility report for PR Luce-Org#135. No contributor code changed in this refresh.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 09:42 cron preflight, fresh conflicted-PR probes, and tmux-driven Claude/Codex delegation results for PRs Luce-Org#237 and Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the latest cron preflight, direct conflict probes, and the new Codex salvage report for PR Luce-Org#135. No contributor code changed; all cleanly mergeable non-draft PR heads remain contained in the stack.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 13:12 EDT reconciliation pass: all mergeable non-draft PR heads remain included, remaining old-layout probes still conflict, Claude Luce-Org#237 again hit its turn limit, and Codex Luce-Org#135 produced a usable selective-port target list.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the 2026-05-29 19:20 unattended probe run, including fresh direct-merge conflict probes for unresolved old-layout PRs and a tmux Codex feasibility report for PR Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 30, 2026
Record the 2026-05-30 02:26 cron preflight, fresh direct-merge probes for remaining old-layout PRs, and the tmux Codex Luce-Org#135 feasibility report. No source code changed.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 30, 2026
Record the 2026-05-30 04:18 unattended integration pass, exact PR containment checks, renewed conflict probes, and the Codex Luce-Org#135 selective-port assessment.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 30, 2026
Record the 2026-05-30 05:29 unattended integration run, including refreshed PR containment, conflict probes, the read-only Luce-Org#135 Codex delegation attempt, verification commands, and retained worktree paths.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 30, 2026
Record the 2026-05-30 06:06 unattended run, unchanged PR containment, fresh conflict probes, and the Codex Luce-Org#135 selective-port assessment.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 30, 2026
Record the 2026-05-30 07:24 cron pass, fresh conflict probes, and the Codex Luce-Org#135 selective-port plan. No PR stack code changes were needed.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 30, 2026
Record the 2026-05-30 17:51 cron preflight/probe pass, including fresh conflict probes for the remaining non-integrated PRs and a tmux/Codex feasibility audit for PR Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Record 2026-05-31 04:10 cron reconciliation: no new PR heads, fresh conflict probes for Luce-Org#305/Luce-Org#237/Luce-Org#221/Luce-Org#154/Luce-Org#153/Luce-Org#135, and failed read-only delegation attempts for Luce-Org#237.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Record the 2026-05-31 05:55 unattended refresh, fresh probes for the remaining non-ancestor PRs, and the completed Codex feasibility review for PR Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Port a minimal PR Luce-Org#135 salvage slice into the current qwen35 target graph. Adds n_seqs-aware prefill-only cache allocation and graph guards while preserving existing n_seqs=1 behavior. Full scheduler/copyback commands remain unported pending build/runtime validation.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Record the 2026-05-31 06:49 cron reconciliation, Luce-Org#135 delegated selective-port attempt, validation outcomes, retained worktrees, and remaining PR classifications.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Selectively ports a small PR Luce-Org#135 request-framing slice onto the current server/test_dflash harness. Legacy stream output remains unchanged unless --stream-tagged is enabled.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Port a narrow PR Luce-Org#135 slice onto the current qwen35 target graph. Batched prefill caches now allocate target_feat with a sequence dimension, the batched graph accepts feature capture when cache dimensions match, and capture copies are emitted per sequence while preserving the existing single-sequence buffer layout.\n\nValidation: git diff --check. Full CMake validation remains blocked locally by missing populated ggml headers under server/deps/llama.cpp and the known CUDA compiler-id sm_52 toolchain issue.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Record the 2026-05-31 09:41 cron preflight, current PR-head containment, fresh merge-probe conflict counts, and the no-edit Luce-Org#135 Claude/Codex review findings. No source code changed in this refresh.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Record the 2026-05-31 11:01 cron pass: exact PR-head containment stayed unchanged at 21 included non-draft PRs, six selective-port candidates remain, and a tmux-driven Codex attempt for the next PR Luce-Org#135 multi-cache-slot slice left the conflicted probe unresolved with no source changes promoted.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Port a narrow slice of PR Luce-Org#135 into the current stack: daemon cache-slot parsing, independent extra TargetCache state, graph/feature-mirror swapping, and cleanup handling. Refresh auto-integration manifest after merging advanced PR Luce-Org#285.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Port a narrow PR Luce-Org#135 daemon-cache-slot follow-up into the current stack: LIST_TARGET_CACHE_SLOTS / LIST_CACHE_SLOTS now report slot count, active slot, and per-slot readiness/cur_pos/last_tok while respecting the active-slot RAII swap. Refresh the auto-integration manifest with current PR classifications and validation notes.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Port a narrow Luce-Org#135 control-plane slice into the current test_dflash daemon path. Add bounded request bookkeeping for REQ/REQUEST-prefixed calls plus LIST_REQUESTS and CANCEL command scaffolding that records state without enabling live scheduler mutation. Refresh the auto-integration manifest with current PR classifications, probe results, delegation evidence, and local validation.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Continue the Luce-Org#135 selective-port stack with diagnostic-only SCHED_STEP and SCHED_DRAIN daemon commands. They report request counts and active/per-slot target-cache state without mutating live scheduler state. Refresh the auto-integration manifest and record the latest Luce-Org#285 head merge.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Port a diagnostic-only slice from PR Luce-Org#135 into test_dflash: an opt-in aligned scheduler bucket selftest that uses local structs and does not mutate daemon scheduling state.\n\nRefresh the auto-integration manifest with current PR classification, probe results, delegation notes, retained worktrees, and validation outcomes.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Port a narrow Luce-Org#135 diagnostic-only slice into the integration stack. SCHED_BATCH_PEEK inspects existing daemon request/cache-slot state and applies the already-carried aligned-bucket selector without mutating scheduler or cache state. Refresh the auto-integration manifest with current probe results.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Record the 2026-06-01 unattended PR integration pass, updated PR Luce-Org#285 head containment, current selective-port conflict counts, and delegated review conclusions for the remaining Luce-Org#321/Luce-Org#325/Luce-Org#135 slices.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Port a narrow PR Luce-Org#135 diagnostic-only daemon command that inspects current request/cache-slot state and reports aligned batch readiness without graph execution, cache copyback, or request-state mutation. Refresh auto-integration metadata and validation notes.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Selective-port a narrow PR Luce-Org#135 cache-reset slice identified by the latest conflicted probe. reset_target_cache now clears TargetCache::last_tok with cur_pos so reused caches cannot retain a stale decode seed.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Record the 2026-06-01 10:14 unattended refresh, direct-merge probes, delegated PR Luce-Org#135/Luce-Org#237 attempts, the promoted Luce-Org#135 cache-reset slice, and validation outcomes.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Refresh the unattended auto-integration manifest after the 2026-06-01 10:34 run. No contributor PR head advanced; direct probes still leave Luce-Org#305, Luce-Org#237, Luce-Org#221, Luce-Org#154, Luce-Org#153, and Luce-Org#135 as selective-port/runtime-validation candidates.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Promote a tiny PR Luce-Org#135 selective-port slice: daemon-mode DFlash snapshot bookkeeping now records the committed generation boundary rather than prompt-plus-output vector length.\n\nRefresh the auto-integration manifest with the latest PR classifications, fresh conflict probes, and the tmux-driven Codex feasibility result.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Record the 2026-06-01 16:07 EDT unattended refresh, exact PR-head containment, fresh probe conflict counts, and the Luce-Org#135 Claude feasibility attempt outcome.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Record the 2026-06-01 17:44 cron preflight, exact PR-head containment, fresh direct-merge probes for the six remaining selective-port candidates, and the stopped tmux Codex Luce-Org#135 feasibility attempt.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
Promote a narrow PR Luce-Org#135 salvage slice identified by tmux-driven Codex. The harness can now synthesize prompt token vectors with --synthetic-prompt-tokens/--synthetic-token plus --n-gen/--out, which makes CUDA smoke runs easier without requiring a prompt_ids.bin fixture. Refresh auto-integration manifest with current PR classifications and probe/delegation results.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
Record the 2026-06-01 20:00 unattended reconciliation pass, fresh conflict probes for the six remaining selective-port candidates, and the latest Luce-Org#135 Codex feasibility result.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
Record the 2026-06-02 01:07 unattended run: no new non-draft PR heads advanced, direct-merge probes still conflict for Luce-Org#305/Luce-Org#237/Luce-Org#221/Luce-Org#154/Luce-Org#153/Luce-Org#135, and a tmux Codex Luce-Org#221 pass found only the already-represented gguf_metadata header slice.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
Promote a small PR Luce-Org#135 salvage slice in the current Qwen35 target-cache layout: initial rollback-cache SSM intermediates now use F16, matching migrate_prefill_cache and the existing typed rollback readers. Refresh the auto-integration manifest with the probe/delegation evidence.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
Record the latest open-PR containment check and direct-merge probes. Document the Luce-Org#135 Codex feasibility pass, which found no remaining safe slice after the prior rollback-cache salvage.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
Port a narrow safe cleanup slice identified from PR Luce-Org#135: if the daemon test lazily loaded the pFlash drafter and did not receive an explicit free command, release it before final graph/cache teardown.\n\nRefresh the auto-integration manifest with the current PR classification, direct-merge conflict counts, and the tmux-driven Codex feasibility result.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
Record the 2026-06-02 07:27 cron refresh after merging origin/main a81128b into the auto-integration stack. Reconfirm open PR accounting, retained conflicted candidates, direct-merge probe counts, and the Codex no-safe-slice review for PR Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 2, 2026
Record the 10:03 cron pass: current refs, open PR accounting, fresh conflict probes for the six remaining selective-port candidates, and a tmux-driven Codex NO_SAFE_SLICE review for PR Luce-Org#135. No source changes were promoted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants