Allozaur/llama UI restructuring by allozaur · Pull Request #16 · allozaur/llama.cpp

allozaur · 2026-05-15T07:33:52Z

Overview

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

* update test scripts * align CI behavior between linux and android * remove automatically cancel in 15min * enable cancel-in-progress * fix ty check issue * update and fix pylint issue * update runner such that we are not restricted by the 15min limit rule * fix flake8 lint issue * update runner according to review feedback * code update according to review feedback * switch from llama-cli to llama-completion binary with -no-cnv flag

Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80 and 112 that are not exactly divided by 32 the regular length of 16 with FP32 accumulation is used instead. The longer tiles also enable more efficient transposition for a warp size of 32 which is why it's also used for RDNA4. However, this scrambles the data layout of the accumulators along the attention head dimension. To prevent accidental misuse I added another entry to ggml_cuda_mma::data_layout. I also tuned the kernel parameters for RDNA3, RDNA4, and CDNA1 in general, during which I discovered that the kernel can be made to work for head sizes up to 256 for CDNA. For RDNA3/4 I was not able to get better performance that the tile kernel for head sizes > 128.

…#23076)

…rg#23041) * Support for Codex CLI by skipping unsupported Responses tools * Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection * Revert gpt-oss apply_patch special handling

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

Add Aime2026Dataset class loading from MathArena/aime_2026 on HuggingFace. 30 problems (two sets of 15), single config/split. Usage: --dataset aime2026 Assisted-by: llama.cpp:local pi

…ui` directory

- Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated) - Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated) - Backward compat: old vars auto-forward to new ones with DEPRECATION warning - Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc. - Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET - Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines - Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED

- Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases) - Add --ui-config (old --webui-config kept as deprecated alias) - Add --ui-config-file (old --webui-config-file kept as deprecated alias) - Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated) - Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY - C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields - Backward compat: old fields synced to new ones in g_params_to_internals

- Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta) - Rename params.webui usage -> params.ui (both synced, old still works) - JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys - Server routes use params.ui_mcp_proxy || params.webui_mcp_proxy - Preprocessor guards use #if defined(LLAMA_BUILD_UI) || defined(LLAMA_BUILD_WEBUI)

- Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build - Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT - Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks - Update server.yml: job/artifact refs webui-build -> ui-build - Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT - Update server-self-hosted.yml: webui-build -> ui-build - Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION - Rename webui-download.cmake -> ui-download.cmake (internal refs updated) - Update labeler.yml: server/webui -> server/ui path label

- Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/ - Update server README.md: CLI tables show --ui flags with deprecated --webui aliases - Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/

zhiyuan8 and others added 5 commits May 14, 2026 13:58

ggml-hexagon: cpy: add contiguous fast-path in reshape copy (ggml-org…

5c0e946

…#23076)

readme : update bindings (ggml-org#23063)

7155a49

Support for Codex CLI by skipping unsupported Responses tools (ggml-o…

91e84fe

…rg#23041) * Support for Codex CLI by skipping unsupported Responses tools * Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection * Revert gpt-oss apply_patch special handling

Copilot AI review requested due to automatic review settings May 15, 2026 07:33

Copilot AI reviewed May 15, 2026

View reviewed changes

github-actions Bot added examples devops script server build server/webui labels May 15, 2026

ServeurpersoCom and others added 15 commits May 15, 2026 11:18

webui: preserve partial response on streaming error (ggml-org#23090)

d528444

reasoning-budget: clone should do a deep-copy (ggml-org#23095)

ac33f03

llama-eval : add AIME 2026 dataset support (ggml-org#23058)

d5dc2e0

Add Aime2026Dataset class loading from MathArena/aime_2026 on HuggingFace. 30 problems (two sets of 15), single config/split. Usage: --dataset aime2026 Assisted-by: llama.cpp:local pi

webui: Move static build output from tools/server/public to `build/…

10710a2

…ui` directory

refactor: Move to tools/ui

ad4913d

fix: Small fixes for UI build

5eed357

fix: CMake.txt syntax

b7df005

chore: Formatting

b2f2886

fix: .editorconfig for llama-ui

3adae5a

chore: Formatting

eb5216b

allozaur force-pushed the allozaur/llama-ui-restructuring branch from 771bb2f to a4cc564 Compare May 15, 2026 11:08

github-actions Bot added the python label May 15, 2026

github-actions Bot added testing Nvidia GPU ggml Hexagon labels May 15, 2026

allozaur force-pushed the allozaur/llama-ui-restructuring branch 2 times, most recently from 8974013 to 8c30b16 Compare May 15, 2026 11:13

refactor: Use APP_NAME in Error route

bcd3694

allozaur force-pushed the allozaur/llama-ui-restructuring branch from 8c30b16 to bcd3694 Compare May 15, 2026 11:15

allozaur added 2 commits May 15, 2026 14:13

refactor: Cleanup

0f53553

refactor: Single migration service

7d25dd2

allozaur closed this May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allozaur/llama UI restructuring#16

Allozaur/llama UI restructuring#16
allozaur wants to merge 23 commits into
masterfrom
allozaur/llama-ui-restructuring

allozaur commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

allozaur commented May 15, 2026

Overview

Additional information

Requirements

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants