feat(dflash): add /props introspection endpoint by easel · Pull Request #190 · Luce-Org/lucebox-hub

easel · 2026-05-13T23:49:36Z

Motivation

I’m working on a set of agent workflow benchmarks and want to capture enough runtime metadata per run to make benchmark cells self-describing: model path, context window, sampling defaults, speculative mode, KV cache settings, pflash/cache state, and daemon liveness. This adds a llama.cpp-style /props endpoint for that purpose, with lucebox-specific details kept as structured extensions.

Summary

Adds a read-only GET /props endpoint returning a JSON snapshot of the live Python-server state for bench-time capture and diagnostics.

The endpoint now uses the cross-server / llama.cpp-compatible shape expected by downstream runtime-props capture:

top-level default_generation_settings.{n_ctx, temperature, top_p, top_k, min_p, repeat_penalty}
top-level model_alias, model_path, and build_info
top-level speculative_mode with off, dflash, or pflash
runtime.backend plus lucebox runtime extensions such as KV cache types, FA window, lazy draft, and target sharding
reasoning.{supported, default, supported_efforts}
sampling.capabilities for operator-visible request parameter support flags

Old aliases are intentionally not emitted: runtime.max_ctx, model.id, model.target_path, flat sampling.supports_*, and reasoning.default_enabled.

Response Shape

Top-level keys:

{ default_generation_settings, model_alias, model_path, build_info, speculative_mode, server, model, runtime, reasoning, speculative, sampling, pflash, prefix_cache, full_cache, tool_replay, daemon, api }

Each section is scoped to Python-server state. Daemon build identity, request-rate metrics, and per-endpoint parameter schemas remain v1 non-goals.

Design Notes

server.version is read from dflash/pyproject.toml via stdlib tomllib; malformed pyproject logs a warning and falls back to "0.0.0+unknown".
server.props_schema = 1 remains the compatibility marker for /props parsers.
runtime.kv_cache_k/v report effective daemon KV types via _effective_kv_type().
runtime.backend is best-effort: DFLASH_RUNTIME_BACKEND / DFLASH27B_GPU_BACKEND, then nearby CMakeCache.txt, then cuda fallback.
speculative_mode is pflash when pflash is enabled, otherwise dflash when DDTree speculative decode is supported, otherwise off.
prefix_cache and full_cache expose cumulative lifetime hit counters.
full_cache.disk_bytes is snapshotted on mutation so /props does not walk the filesystem on read.
tool_replay reports the exact tool-call replay memory counters from ToolMemory.
api.endpoints is hand-curated with a drift test against FastAPI routes.

Tests

Focused server suite passes:

uv run --extra dev pytest dflash/scripts/test_server.py -q — 75 passed

Covered areas include endpoint shape, removed old aliases, default generation settings, reasoning shape, speculative mode selection, backend resolution, arch gating, pflash toggle, target-sharding cache behavior, endpoint-list drift, KV type resolution, cache counters, full-cache disk-byte snapshots, and ToolMemory.stats().

Open Items

Run curl http://localhost:8000/props | jq . against a live server after deployment/restart and sanity-check runtime values.

🤖 Generated with Claude Code

cubic-dev-ai

2 issues found across 5 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="dflash/scripts/server.py">

<violation number="1" location="dflash/scripts/server.py:150">
P2: Unvalidated `float()` parsing of `DFLASH_FP_ALPHA` can crash `/props` on malformed env values</violation>

<violation number="2" location="dflash/scripts/server.py:926">
P2: /props misreports target_sharding for laguna by checking requested extra_daemon_args instead of effective daemon args</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

GET /props returns a single read-only JSON document describing the live Python-server state — model arch, KV/FA config, pflash mode, cache occupancy, daemon liveness — for bench-time capture and diagnostics. Matches llama.cpp's /props convention; modeled after antirez/ds4 PR Luce-Org#81. Shape sections: server / model / runtime / reasoning / speculative / sampling / pflash / prefix_cache / full_cache / tool_replay / daemon / api. Field-by-field rationale lives in dflash/docs/props_endpoint_plan.md. Implementation notes: - server.version is read from dflash/pyproject.toml via stdlib tomllib; importlib.metadata is skipped because the workspace declares [tool.uv] package=false (never installed as a wheel). - props_schema=1 is a separate compat marker for clients that parse /props programmatically. Bump rules live in a comment by the constant. - Arch-gated capability booleans (reasoning_supported, speculative_ supported, tools_supported) flow through a single _capabilities() helper so /props and the Codex /v1/models variant cannot drift. - runtime.kv_cache_k/v come from a new _effective_kv_type() that mirrors the C++ resolve_kv_types() rules (qwen35 default Q4_0, laguna default Q8_0, per-arch precedence chains). Distinct from _resolve_kv_k_type(), which remains a stable hash salt for the prefix cache. - prefix_cache and full_cache now carry cumulative _lifetime_hits counters incremented at the existing hit sites; they survive eviction unlike per-entry hit counts. - full_cache.disk_bytes is snapshotted on every mutation (confirm_full_snap, _retire_full_entry, rehydrate_full_cache) so /props never has to walk the filesystem on read. - ToolMemory.stats() returns counters under no lock; cross-field tear is acceptable for introspection, documented in a comment. Tests (17 new, all passing alongside the 54 existing baseline tests): - Shape / version / version fallback - Arch gating (qwen35, laguna) - pflash enabled/disabled toggle - target-sharding disables both cache layers - api.endpoints drift detector vs actual FastAPI routes - _capabilities helper - _effective_kv_type per-arch + per-axis behavior - PrefixCache lifetime_hits survives eviction - full_cache disk_bytes refreshes on add and on retire - ToolMemory.stats() reflects current entries/bytes Explicit v1 non-goals (see plan doc): no /metrics, no daemon build identity, no per-endpoint param schemas, no daemon PID/uptime/bin_path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

easel · 2026-05-14T00:41:27Z

@cubic-dev-ai thanks for the review — both flagged against commit 5600c93. Status on current HEAD (f6f8e97):

Issue 1 (server.py:150, unvalidated float(DFLASH_FP_ALPHA)) — already addressed in the first amendment (d492983). The parse goes through a new _parse_optional_float() helper that returns None and logs a warning on non-numeric values rather than raising at request time.

Issue 2 (server.py:926, target_sharding misreport on laguna) — valid bug, fixed in f6f8e97. Root cause: the laguna daemon-spawn path (if arch in _LAGUNA_ARCHES: cmd = [...]) doesn't call cmd.extend(extra_daemon_args) — that branch is qwen35-only. So on arch=laguna with --target-gpus=... passed, the flag is silently dropped at spawn but /props was still reporting target_sharding: True. Now gated:

"target_sharding": (
    arch not in _LAGUNA_ARCHES
    and _extra_daemon_has_target_sharding(extra_daemon_args)
),

Added test_props_target_sharding_false_on_laguna_even_when_args_passed to lock the behavior in. 72/72 tests passing.

cubic-dev-ai · 2026-05-14T00:42:34Z

@cubic-dev-ai thanks for the review — both flagged against commit 5600c93. Status on current HEAD (f6f8e97):

Issue 1 (server.py:150, unvalidated float(DFLASH_FP_ALPHA)) — already addressed in the first amendment (d492983). The parse goes through a new _parse_optional_float() helper that returns None and logs a warning on non-numeric values rather than raising at request time.

Issue 2 (server.py:926, target_sharding misreport on laguna) — valid bug, fixed in f6f8e97. Root cause: the laguna daemon-spawn path (if arch in _LAGUNA_ARCHES: cmd = [...]) doesn't call cmd.extend(extra_daemon_args) — that branch is qwen35-only. So on arch=laguna with --target-gpus=... passed, the flag is silently dropped at spawn but /props was still reporting target_sharding: True. Now gated:
...

@easel Both fixes look correct in the current diff — nice work.

Issue 1 — _parse_optional_float is clean. Wraps the bare float() call in a try/except with a logged warning instead of crashing, returns None for empty/missing/non-numeric values. pflash.bsa_alpha properly shows null in the disabled path too.

Issue 2 — the arch not in _LAGUNA_ARCHES guard is the right fix. Since the laguna branch discards extra_daemon_args entirely, reflecting that in /props avoids a misleading report. The test locking this in (test_props_target_sharding_false_on_laguna_even_when_args_passed) is good — it'd catch a future refactor that accidentally wires extra_daemon_args into the laguna spawn without updating the reporting logic.

No further concerns from my side. 72/72 ✅

easel · 2026-05-22T18:51:24Z

Superseded by integration PR #226 via 5b67cf2 (server props + thinking controls reworked into the integration line).

cubic-dev-ai Bot reviewed May 13, 2026

View reviewed changes

Comment thread dflash/scripts/server.py Outdated

Comment thread dflash/scripts/server.py Outdated

easel force-pushed the feat/props-endpoint branch from 5600c93 to d492983 Compare May 14, 2026 00:20

easel marked this pull request as draft May 14, 2026 00:21

easel force-pushed the feat/props-endpoint branch 2 times, most recently from b93232c to 87888dc Compare May 14, 2026 00:39

easel force-pushed the feat/props-endpoint branch from 87888dc to f6f8e97 Compare May 14, 2026 00:41

fix(dflash): align /props runtime schema

3ff4e12

easel closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dflash): add /props introspection endpoint#190

feat(dflash): add /props introspection endpoint#190
easel wants to merge 2 commits into
Luce-Org:mainfrom
easel:feat/props-endpoint

easel commented May 13, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

easel commented May 14, 2026

Uh oh!

cubic-dev-ai Bot commented May 14, 2026

Uh oh!

easel commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

easel commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Summary

Response Shape

Design Notes

Tests

Open Items

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

easel commented May 14, 2026

Uh oh!

cubic-dev-ai Bot commented May 14, 2026

Uh oh!

easel commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

easel commented May 13, 2026 •

edited

Loading