Add built-in local LLM summarization via llama-cpp-python by paberr · Pull Request #11 · paberr/ownscribe

paberr · 2026-02-24T22:02:54Z

Replace the default Ollama backend with a built-in local summarizer using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF). Summarization now works out of the box with no external server required.

Add LlamaCppSummarizer backend with automatic GGUF model download
Wire download progress into PipelineProgress TUI for all code paths (run_transcribe, run_summarize, run_resume, run_warmup)
Suppress stderr noise (ggml_metal_init) during model loading
Update default config from ollama/mistral to local/phi-4-mini
Add huggingface-hub and llama-cpp-python dependencies
Fix tests to mock _ensure_model so they don't trigger real downloads

Replace the default Ollama backend with a built-in local summarizer using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF). Summarization now works out of the box with no external server required. - Add LlamaCppSummarizer backend with automatic GGUF model download - Wire download progress into PipelineProgress TUI for all code paths (run_transcribe, run_summarize, run_resume, run_warmup) - Suppress stderr noise (ggml_metal_init) during model loading - Update default config from ollama/mistral to local/phi-4-mini - Add huggingface-hub and llama-cpp-python dependencies - Fix tests to mock _ensure_model so they don't trigger real downloads Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Allow users to specify any HuggingFace-hosted GGUF model in their config using the hf:owner/repo/filename.gguf syntax, which auto-downloads via hf_hub_download. This avoids requiring manual downloads for models not in the built-in registry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR replaces the default Ollama-based summarization backend with a built-in local LLM using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF model). Summarization now works out of the box without requiring external services like Ollama or LM Studio. The PR adds automatic model downloading with progress tracking integrated into the existing TUI, updates the default configuration, and fixes tests to prevent real downloads during test runs.

Changes:

Implements LlamaCppSummarizer backend with automatic GGUF model download from HuggingFace
Integrates download progress into PipelineProgress TUI across all code paths (run_transcribe, run_summarize, run_warmup)
Changes default summarization backend from "ollama" to "local" and model from "mistral" to "phi-4-mini"

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
src/ownscribe/summarization/llama_cpp_summarizer.py	New summarizer backend using llama-cpp-python with model registry, download logic, and stderr suppression
src/ownscribe/summarization/init.py	Updated factory to create LlamaCppSummarizer for "local" backend
src/ownscribe/pipeline.py	Added _download_summarization_model helper; integrated model downloading into run_warmup, run_summarize, and _do_transcribe_and_summarize
src/ownscribe/progress.py	Added download_summarizer parameter to PipelineProgress for model download progress tracking
src/ownscribe/config.py	Changed default backend to "local" and default model to "phi-4-mini"
tests/test_pipeline.py	Added mocks for _ensure_model to prevent real downloads; added tests for warmup download behavior
tests/test_config.py	Updated test expectation for new default backend
pyproject.toml	Added llama-cpp-python and huggingface-hub dependencies
uv.lock	Locked versions for new dependencies
README.md	Updated documentation to reflect built-in model; removed Ollama requirement from prerequisites
CONTRIBUTING.md	Updated LLM backends description to mention built-in local model
AGENTS.md	Updated architecture notes to include LlamaCppSummarizer and lazy import of llama_cpp

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/ownscribe/summarization/llama_cpp_summarizer.py

src/ownscribe/config.py

src/ownscribe/summarization/llama_cpp_summarizer.py

src/ownscribe/pipeline.py

- Make _suppress_stderr robust with try/except fallback to no-op - Add error handling for model download in run_summarize - Support json_schema in chat() with fallback chain - Fix is_available() to check llama_cpp import - Add comprehensive unit tests for LlamaCppSummarizer and _ensure_model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 24, 2026 22:02

Copilot started reviewing on behalf of paberr February 24, 2026 22:03 View session

paberr added the enhancement New feature or request label Feb 24, 2026

paberr self-assigned this Feb 24, 2026

Copilot AI reviewed Feb 24, 2026

View reviewed changes

paberr merged commit ec9a05c into main Feb 24, 2026
2 checks passed

paberr deleted the add-local-llm-summarization branch February 24, 2026 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add built-in local LLM summarization via llama-cpp-python#11

Add built-in local LLM summarization via llama-cpp-python#11
paberr merged 3 commits intomainfrom
add-local-llm-summarization

paberr commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paberr commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants