Add built-in local LLM summarization via llama-cpp-python#11
Merged
Conversation
Replace the default Ollama backend with a built-in local summarizer using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF). Summarization now works out of the box with no external server required. - Add LlamaCppSummarizer backend with automatic GGUF model download - Wire download progress into PipelineProgress TUI for all code paths (run_transcribe, run_summarize, run_resume, run_warmup) - Suppress stderr noise (ggml_metal_init) during model loading - Update default config from ollama/mistral to local/phi-4-mini - Add huggingface-hub and llama-cpp-python dependencies - Fix tests to mock _ensure_model so they don't trigger real downloads Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allow users to specify any HuggingFace-hosted GGUF model in their config using the hf:owner/repo/filename.gguf syntax, which auto-downloads via hf_hub_download. This avoids requiring manual downloads for models not in the built-in registry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR replaces the default Ollama-based summarization backend with a built-in local LLM using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF model). Summarization now works out of the box without requiring external services like Ollama or LM Studio. The PR adds automatic model downloading with progress tracking integrated into the existing TUI, updates the default configuration, and fixes tests to prevent real downloads during test runs.
Changes:
- Implements
LlamaCppSummarizerbackend with automatic GGUF model download from HuggingFace - Integrates download progress into
PipelineProgressTUI across all code paths (run_transcribe, run_summarize, run_warmup) - Changes default summarization backend from "ollama" to "local" and model from "mistral" to "phi-4-mini"
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ownscribe/summarization/llama_cpp_summarizer.py | New summarizer backend using llama-cpp-python with model registry, download logic, and stderr suppression |
| src/ownscribe/summarization/init.py | Updated factory to create LlamaCppSummarizer for "local" backend |
| src/ownscribe/pipeline.py | Added _download_summarization_model helper; integrated model downloading into run_warmup, run_summarize, and _do_transcribe_and_summarize |
| src/ownscribe/progress.py | Added download_summarizer parameter to PipelineProgress for model download progress tracking |
| src/ownscribe/config.py | Changed default backend to "local" and default model to "phi-4-mini" |
| tests/test_pipeline.py | Added mocks for _ensure_model to prevent real downloads; added tests for warmup download behavior |
| tests/test_config.py | Updated test expectation for new default backend |
| pyproject.toml | Added llama-cpp-python and huggingface-hub dependencies |
| uv.lock | Locked versions for new dependencies |
| README.md | Updated documentation to reflect built-in model; removed Ollama requirement from prerequisites |
| CONTRIBUTING.md | Updated LLM backends description to mention built-in local model |
| AGENTS.md | Updated architecture notes to include LlamaCppSummarizer and lazy import of llama_cpp |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Make _suppress_stderr robust with try/except fallback to no-op - Add error handling for model download in run_summarize - Support json_schema in chat() with fallback chain - Fix is_available() to check llama_cpp import - Add comprehensive unit tests for LlamaCppSummarizer and _ensure_model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the default Ollama backend with a built-in local summarizer using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF). Summarization now works out of the box with no external server required.