chore(repo): rename dflash→server, group pflash+megakernel under optimizations/#281
Merged
Merged
Conversation
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
May 26, 2026
…name) Brings in PR Luce-Org#281 (chore: rename dflash→server, pflash+megakernel → optimizations/) + small docs polish 080f89b. Our lucebox/ Python package (added by us in 2560086, never upstream) is untouched. Our docs additions under dflash/docs/* are migrated to server/docs/*. Our deletions of bench scripts confirmed against the new server/scripts/* paths. Workspace members in pyproject.toml: ["server", "lucebox", "optimizations/megakernel", "optimizations/pflash"] — preserving our lucebox member alongside upstream's renamed paths. # Conflicts: # README.md # pyproject.toml # server/docs/BENCHMARK_SNAPSHOT_SPEC.md # server/docs/experiments/cache-impact-2026-05-24.md # server/docs/experiments/gemma4-26b-thinking-control-2026-05-25.md # server/docs/experiments/kv-cache-q4-vs-tq3-2026-05-25.md # server/docs/experiments/thinking-control-protocol.md # server/docs/experiments/thinking-mechanism-explainer.md # server/docs/run-requests/area-swe-bench-integration.md # server/docs/run-requests/bragi-gemma4-laguna-config-issues.md # server/docs/run-requests/forge-vs-vidar-ds4f.md # server/docs/run-requests/luce-dflash-think-92.md # server/docs/run-requests/qwen36-budget-signaling-overhaul.md # server/docs/run-requests/qwen36-hard-limit-reply-budget-bump.md # server/docs/run-requests/sindri-rtx3090ti-qwen36-nothink-92.md # server/scripts/bench_agent.py # server/scripts/bench_agent_loop.py # server/scripts/bench_daemon.py # server/scripts/bench_he.py # server/scripts/bench_he_http.py # server/scripts/bench_llm.py # server/scripts/bench_server.py # server/scripts/entrypoint.sh # server/scripts/fixtures/agent_cases/cases.json # server/scripts/server.py # server/scripts/test_prefix_cache.py # server/scripts/test_server.py
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
May 27, 2026
PR Luce-Org#281 moved dflash/ → server/. The pull_request `paths:` filter still targeted dflash/* — so PRs touching the C++ server code wouldn't trigger the Docker prebuild sanity check. Repoint to server/ so CI catches Dockerfile / source regressions before merge.
This was referenced May 27, 2026
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
May 27, 2026
…nch in-tree Collapses 134 commits of `integration/props-uv-squared-clean` onto current main as one reviewable change. Most of the underlying server-side work already landed via separate PRs: thinking-budget v2 + multi-dialect reasoning + sidecars (Luce-Org#269), dflash→server rename + optimizations/ grouping (Luce-Org#281), qwen35moe hybrid CPU/CUDA expert split (Luce-Org#262), and a stream of smaller fixes from bragi over the last week. What remained in integration is everything *above* the server: the host-side runner, the container image, the benchmark/profile evidence pipeline, the harness for driving real clients, and the luce-bench framework itself. ## What changed ### Docker + host wrapper - Dockerfile (CUDA 12.8 base; copies server/, lucebox/, harness/, luce-bench/ into one image; wires `python -m lucebench.cli` as the `benchmark` entrypoint subcommand). - `lucebox.sh` (~470 lines of host bash, zero deps beyond docker + nvidia-smi): `check`, `configure`, `pull`, `download-models`, `serve`, `install`/`start`/ `status`/`logs` (user-systemd), `print-run`, `benchmark`, `profile`. - `.github/workflows/docker.yml` builds + pushes `ghcr.io/luce-org/lucebox-hub` tags (`:cuda12`, `:vX.Y.Z-cuda12`, `:X.Y-cuda12`, `:sha-<short>-cuda12`). - `server/scripts/entrypoint.sh` resolves draft GGUF by target architecture (gemma4 → gemma drafter, qwen3.6 → dflash-draft-3.6); warns when multiple targets are present in models/. ### lucebox Python package (in-container CLI) - `lucebox/` workspace member: `cli.py`, `autotune.py` (VRAM-tiered tier selection + per-host config writeback), `config.py` (typed TOML), `download.py`, `docker_run.py`, `host_check.py`, `host_facts.py`, `profile.py` (profile sweep across DFLASH_MAX_CTX × DFLASH_BUDGET, KV cache types, pFlash modes, lazy-draft, prefix-cache slots), `smoke.py`, `types.py`. - `lucebox/tests/` for the typed surfaces. - Level1/Level2/Level3 profile gates; sweep results merged back into `~/.lucebox/config.toml` only after capability + ds4-eval/agentic-tools/ agentic-session validation gates pass. ### luce-bench in monorepo - `luce-bench/` workspace member at v0.2.4 — the standalone bench framework (areas: ds4-eval, code, longctx, agent, forge; sweep + per-host snapshot output; v0.2.4 includes the forge area's EvalConfig + run_scenario signature realignment). - `[tool.uv.sources] luce-bench = { workspace = true }` replaces the prior git tag pin. - `.github/workflows/release-luce-bench.yml` publishes to PyPI from the monorepo on `luce-bench-v*` tags (trusted publisher, `pypi` environment). ### harness workspace - `harness/` workspace member: client adapters (`claude_code`, `codex`, `opencode`, `hermes`, `pi`, `openclaw`), `client_test_runner.py`, `benchmarks/run_lucebox_vs_llamacpp.sh`, prompts. `lucebox profile` delegates the actual bench runs to harness. ### Bench + profile evidence - `server/docs/BENCHMARK_SNAPSHOT_SPEC.md` — schema for tuning/profile artifacts. Snapshots themselves live in the standalone `luce-bench-baselines` repo (out of this tree). ### Misc - Updated CI workflow path filters for `server/` (post-rename). - README's "Quick start" section, hardware coverage table, env var reference table; minor edits to optimizations READMEs. - model card sidecar updates landed alongside Luce-Org#269 but kept here at current values (qwen3.6, gemma-4-26b-a4b, gemma-4-31b, laguna-xs.2, `_schema.json`). ## Out of scope / follow-ups - 31b backend wiring beyond what `share/model_cards/gemma-4-31b-it.json` shipped (working empirically @ 24GB on sindri AR-only; 26b spec-decode path already proven). - gemma4 MoE expert split (howard0su's PR Luce-Org#262 territory; merged but not applied to gemma4 yet). - Multi-Token Prediction (upstream PR #23398, draft). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
May 27, 2026
…nch in-tree Collapses 134 commits of `integration/props-uv-squared-clean` onto current main as one reviewable change. Most of the underlying server-side work already landed via separate PRs: thinking-budget v2 + multi-dialect reasoning + sidecars (Luce-Org#269), dflash→server rename + optimizations/ grouping (Luce-Org#281), qwen35moe hybrid CPU/CUDA expert split (Luce-Org#262), and a stream of smaller fixes from bragi over the last week. What remained in integration is everything *above* the server: the host-side runner, the container image, the benchmark/profile evidence pipeline, the harness for driving real clients, and the luce-bench framework itself. ## What changed ### Docker + host wrapper - Dockerfile (CUDA 12.8 base; copies server/, lucebox/, harness/, luce-bench/ into one image; wires `python -m lucebench.cli` as the `benchmark` entrypoint subcommand). - `lucebox.sh` (~470 lines of host bash, zero deps beyond docker + nvidia-smi): `check`, `configure`, `pull`, `download-models`, `serve`, `install`/`start`/ `status`/`logs` (user-systemd), `print-run`, `benchmark`, `profile`. - `.github/workflows/docker.yml` builds + pushes `ghcr.io/luce-org/lucebox-hub` tags (`:cuda12`, `:vX.Y.Z-cuda12`, `:X.Y-cuda12`, `:sha-<short>-cuda12`). - `server/scripts/entrypoint.sh` resolves draft GGUF by target architecture (gemma4 → gemma drafter, qwen3.6 → dflash-draft-3.6); warns when multiple targets are present in models/. ### lucebox Python package (in-container CLI) - `lucebox/` workspace member: `cli.py`, `autotune.py` (VRAM-tiered tier selection + per-host config writeback), `config.py` (typed TOML), `download.py`, `docker_run.py`, `host_check.py`, `host_facts.py`, `profile.py` (profile sweep across DFLASH_MAX_CTX × DFLASH_BUDGET, KV cache types, pFlash modes, lazy-draft, prefix-cache slots), `smoke.py`, `types.py`. - `lucebox/tests/` for the typed surfaces. - Level1/Level2/Level3 profile gates; sweep results merged back into `~/.lucebox/config.toml` only after capability + ds4-eval/agentic-tools/ agentic-session validation gates pass. ### luce-bench in monorepo - `luce-bench/` workspace member at v0.2.4 — the standalone bench framework (areas: ds4-eval, code, longctx, agent, forge; sweep + per-host snapshot output; v0.2.4 includes the forge area's EvalConfig + run_scenario signature realignment). - `[tool.uv.sources] luce-bench = { workspace = true }` replaces the prior git tag pin. - `.github/workflows/release-luce-bench.yml` publishes to PyPI from the monorepo on `luce-bench-v*` tags (trusted publisher, `pypi` environment). ### harness workspace - `harness/` workspace member: client adapters (`claude_code`, `codex`, `opencode`, `hermes`, `pi`, `openclaw`), `client_test_runner.py`, `benchmarks/run_lucebox_vs_llamacpp.sh`, prompts. `lucebox profile` delegates the actual bench runs to harness. ### Bench + profile evidence - `server/docs/BENCHMARK_SNAPSHOT_SPEC.md` — schema for tuning/profile artifacts. Snapshots themselves live in the standalone `luce-bench-baselines` repo (out of this tree). ### Misc - Updated CI workflow path filters for `server/` (post-rename). - README's "Quick start" section, hardware coverage table, env var reference table; minor edits to optimizations READMEs. - model card sidecar updates landed alongside Luce-Org#269 but kept here at current values (qwen3.6, gemma-4-26b-a4b, gemma-4-31b, laguna-xs.2, `_schema.json`). ## Out of scope / follow-ups - 31b backend wiring beyond what `share/model_cards/gemma-4-31b-it.json` shipped (working empirically @ 24GB on sindri AR-only; 26b spec-decode path already proven). - gemma4 MoE expert split (howard0su's PR Luce-Org#262 territory; merged but not applied to gemma4 yet). - Multi-Token Prediction (upstream PR #23398, draft). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
May 27, 2026
…nch in-tree Collapses 134 commits of `integration/props-uv-squared-clean` onto current main as one reviewable change. Most of the underlying server-side work already landed via separate PRs: thinking-budget v2 + multi-dialect reasoning + sidecars (Luce-Org#269), dflash→server rename + optimizations/ grouping (Luce-Org#281), qwen35moe hybrid CPU/CUDA expert split (Luce-Org#262), and a stream of smaller fixes from bragi over the last week. What remained in integration is everything *above* the server: the host-side runner, the container image, the benchmark/profile evidence pipeline, the harness for driving real clients, and the luce-bench framework itself. ## What changed ### Docker + host wrapper - Dockerfile (CUDA 12.8 base; copies server/, lucebox/, harness/, luce-bench/ into one image; wires `python -m lucebench.cli` as the `benchmark` entrypoint subcommand). - `lucebox.sh` (~470 lines of host bash, zero deps beyond docker + nvidia-smi): `check`, `configure`, `pull`, `download-models`, `serve`, `install`/`start`/ `status`/`logs` (user-systemd), `print-run`, `benchmark`, `profile`. - `.github/workflows/docker.yml` builds + pushes `ghcr.io/luce-org/lucebox-hub` tags (`:cuda12`, `:vX.Y.Z-cuda12`, `:X.Y-cuda12`, `:sha-<short>-cuda12`). - `server/scripts/entrypoint.sh` resolves draft GGUF by target architecture (gemma4 → gemma drafter, qwen3.6 → dflash-draft-3.6); warns when multiple targets are present in models/. ### lucebox Python package (in-container CLI) - `lucebox/` workspace member: `cli.py`, `autotune.py` (VRAM-tiered tier selection + per-host config writeback), `config.py` (typed TOML), `download.py`, `docker_run.py`, `host_check.py`, `host_facts.py`, `profile.py` (profile sweep across DFLASH_MAX_CTX × DFLASH_BUDGET, KV cache types, pFlash modes, lazy-draft, prefix-cache slots), `smoke.py`, `types.py`. - `lucebox/tests/` for the typed surfaces. - Level1/Level2/Level3 profile gates; sweep results merged back into `~/.lucebox/config.toml` only after capability + ds4-eval/agentic-tools/ agentic-session validation gates pass. ### luce-bench in monorepo - `luce-bench/` workspace member at v0.2.4 — the standalone bench framework (areas: ds4-eval, code, longctx, agent, forge; sweep + per-host snapshot output; v0.2.4 includes the forge area's EvalConfig + run_scenario signature realignment). - `[tool.uv.sources] luce-bench = { workspace = true }` replaces the prior git tag pin. - `.github/workflows/release-luce-bench.yml` publishes to PyPI from the monorepo on `luce-bench-v*` tags (trusted publisher, `pypi` environment). ### harness workspace - `harness/` workspace member: client adapters (`claude_code`, `codex`, `opencode`, `hermes`, `pi`, `openclaw`), `client_test_runner.py`, `benchmarks/run_lucebox_vs_llamacpp.sh`, prompts. `lucebox profile` delegates the actual bench runs to harness. ### Bench + profile evidence - `server/docs/BENCHMARK_SNAPSHOT_SPEC.md` — schema for tuning/profile artifacts. Snapshots themselves live in the standalone `luce-bench-baselines` repo (out of this tree). ### Misc - Updated CI workflow path filters for `server/` (post-rename). - README's "Quick start" section, hardware coverage table, env var reference table; minor edits to optimizations READMEs. - model card sidecar updates landed alongside Luce-Org#269 but kept here at current values (qwen3.6, gemma-4-26b-a4b, gemma-4-31b, laguna-xs.2, `_schema.json`). ## Out of scope / follow-ups - 31b backend wiring beyond what `share/model_cards/gemma-4-31b-it.json` shipped (working empirically @ 24GB on sindri AR-only; 26b spec-decode path already proven). - gemma4 MoE expert split (howard0su's PR Luce-Org#262 territory; merged but not applied to gemma4 yet). - Multi-Token Prediction (upstream PR #23398, draft). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Repo layout cleanup. Treat the project as an inference engine.
dflash/→server/(the C++/CUDA inference server lives here)pflash/→optimizations/pflash/megakernel/→optimizations/megakernel/Why
dflashis an implementation detail of how the server does spec-decode. The directory at the root of the repo is the inference server as a product surface.pflashandmegakernelare perf optimizations of that server, so they live underoptimizations/.What changed
git mv(full rename history preserved).gitmodules— submodule paths →server/deps/llama.cpp,server/deps/Block-Sparse-Attentionpyproject.toml—[tool.uv.workspace] members→["server", "optimizations/megakernel", "optimizations/pflash"].github/workflows/ci.yml— alldflash/...build paths →server/...scripts/check_uv_workspace.sh— workspace verification pathsharness/clients/*.sh,harness/benchmarks/*.sh— bench script pathsWhat did NOT change
lucebox-dflash,pflash,qwen35-megakernel-bf16(less downstream breakage; only directory layout moved)#includepaths (src-relative via CMake-Isrc, unaffected by move)share/model_cards/runtime lookup (usesself_bin_dir()— relative)dflash/deps/...identifiers — arbitrary, only paths matter)Tested on lucebox2 (RTX 3090, CUDA 12.6, sm_86)
Server boots, model loads, both AR and dflash spec-decode paths work, telemetry intact, model output sensible.
Coordination note
This conflicts with every open PR touching
dflash/*(PRs #226, #75, #48, plus weicj's open work). Suggest landing on a coordinated freeze window so contributors can rebase in one pass. Once merged, downstream rebases are mechanical — just replacedflash/withserver/in their changed paths.🧙 Built with WOZCODE