feat(server meta): expose SWA geometry in /v1/models meta by marksverdhei · Pull Request #132 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-17T19:04:22Z

Closes #123.

What

Adds 3 new fields to the /v1/models meta object so clients (heierchat VRAM panel) can compute exact KV bytes/token for SWA-hybrid models (gemma-4 12B/31B, gemma-4-12b-256k@gem) instead of the current safe-upper-bound estimate.

New meta fields (per loaded model)

Field	Source	Example (gemma-4 12B)
`n_swa`	`llama_model_n_swa(model)`	`1024`
`n_swa_layers`	new `llama_model_n_swa_layers(model)`	`40`
`swa_type`	new `llama_model_swa_type_name(model)`	`"standard"`

New public API

LLAMA_API int32_t    llama_model_n_swa_layers(const struct llama_model * model);
LLAMA_API const char * llama_model_swa_type_name(const struct llama_model * model);

Exact KV formula the FE will compute

kv_per_tok(ctx) = (k_gqa + v_gqa) * kv_type_bytes * (
                    n_full_layers
                  + n_swa_layers * min(ctx, n_swa) / ctx )

where n_full_layers = n_layer - n_swa_layers.

Verification

CPU-only build (-DGGML_CUDA=OFF): compiles clean, zero warnings
Purely additive — no behavior change until consumers read the fields

Files changed (4, +36 lines)

include/llama.h — 2 new API declarations
src/llama-model.cpp — implementations (layer-count loop + enum→string)
tools/server/server-context.h — 3 new fields in server_context_meta
tools/server/server-context.cpp — populate + emit in /v1/models JSON

Add three new fields to the /v1/models meta object so clients can compute exact KV bytes/token for SWA-hybrid models (gemma-4 etc.): n_swa — sliding window size (already in API, now in meta) n_swa_layers — count of SWA layers in the model swa_type — "none" | "standard" | "chunked" | "symmetric" New public API: llama_model_n_swa_layers(model) — loops hparams.is_swa(il) llama_model_swa_type_name(model) — enum-to-string for swa_type With these, clients compute the exact formula: kv/tok = (k_gqa + v_gqa) * kv_bytes * ( n_full_layers + n_swa_layers * min(ctx, n_swa) / ctx ) where n_full_layers = n_layer - n_swa_layers. Previously the best a client could do was assume all layers scale with ctx (correct for dense, 2-8x over-estimate for SWA-hybrid). Purely additive — no behavior change until consumers read the fields. Closes #123.

marksverdhei merged commit 765d7fb into ht Jun 17, 2026
9 checks passed

marksverdhei deleted the feat/swa-geometry-meta branch June 17, 2026 19:14

marksverdhei mentioned this pull request Jun 17, 2026

Hivemind Maintenance Tasks Epoch 1 #112

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server meta): expose SWA geometry in /v1/models meta#132

feat(server meta): expose SWA geometry in /v1/models meta#132
marksverdhei merged 1 commit into
htfrom
feat/swa-geometry-meta

marksverdhei commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marksverdhei commented Jun 17, 2026

What

New meta fields (per loaded model)

New public API

Exact KV formula the FE will compute

Verification

Files changed (4, +36 lines)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant