Skip to content

Built-in llama-cpp provider via inline ExtensionFactory#4823

Open
julien-c wants to merge 6 commits into
earendil-works:mainfrom
julien-c:builtin-llama-cpp-provider
Open

Built-in llama-cpp provider via inline ExtensionFactory#4823
julien-c wants to merge 6 commits into
earendil-works:mainfrom
julien-c:builtin-llama-cpp-provider

Conversation

@julien-c
Copy link
Copy Markdown
Contributor

@julien-c julien-c commented May 20, 2026

Built-in llama-cpp provider, activated when any LLAMA_* env var is set (LLAMA_BASE_URL, LLAMA_CACHE, or LLAMA_ARG_*). Shipped as an inline ExtensionFactory so it can use the existing extension event hooks.

Discovers models from ${baseUrl}/models at startup and on /model, then refines per-model contextWindow via ${server}/props on first use.

Test plan

  • No LLAMA_* set: factory not instantiated, pi unchanged.
  • LLAMA_BASE_URL + running llama-server: provider appears, models list, streams.
  • Server down: surfaces a notify error, other providers still work.

julien-c and others added 6 commits May 20, 2026 19:26
Co-Authored-By: julien-agent <Agents+cyolo@huggingface.co>
Co-Authored-By: julien-agent <Agents+cyolo@huggingface.co>
Co-Authored-By: julien-agent <Agents+cyolo@huggingface.co>
Add ExtensionFactoryEntry union so inline factories can carry a path,
thread "<built-in:llama-cpp>" through the resource loader, and strip
the <built-in:NAME> wrapping in interactive-mode label helpers.

Co-Authored-By: julien-agent <Agents+cyolo@huggingface.co>
…veat

Built-in slash commands like /model are intercepted by the interactive
editor before emitInput runs, so the pi.on("input") handler never sees
them. Surface failures via ctx.ui.notify when ctx is available, and
leave a comment pointing at a future "model_selector_open" event as
the proper trigger to refresh the model list.

Co-Authored-By: julien-agent <Agents+cyolo@huggingface.co>
@julien-c julien-c marked this pull request as ready for review May 20, 2026 18:09
@julien-c
Copy link
Copy Markdown
Contributor Author

Hi @badlogic does this approach of hooking a built-in provider for llama-cpp (to make local models as seamless as remote ones 🙏 ) based on a few possible env vars, look reasonable to you, or not really? (cc @hanouticelina with whom we've worked on this)

@badlogic
Copy link
Copy Markdown
Collaborator

i'll need a few days to find time to think about and review this. directionally, i think it is right. not sure about env var detection yet. could be a setting instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants