Skip to content

Add built-in local LLM summarization via llama-cpp-python#11

Merged
paberr merged 3 commits intomainfrom
add-local-llm-summarization
Feb 24, 2026
Merged

Add built-in local LLM summarization via llama-cpp-python#11
paberr merged 3 commits intomainfrom
add-local-llm-summarization

Conversation

@paberr
Copy link
Owner

@paberr paberr commented Feb 24, 2026

Replace the default Ollama backend with a built-in local summarizer using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF). Summarization now works out of the box with no external server required.

  • Add LlamaCppSummarizer backend with automatic GGUF model download
  • Wire download progress into PipelineProgress TUI for all code paths (run_transcribe, run_summarize, run_resume, run_warmup)
  • Suppress stderr noise (ggml_metal_init) during model loading
  • Update default config from ollama/mistral to local/phi-4-mini
  • Add huggingface-hub and llama-cpp-python dependencies
  • Fix tests to mock _ensure_model so they don't trigger real downloads

Replace the default Ollama backend with a built-in local summarizer
using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF). Summarization
now works out of the box with no external server required.

- Add LlamaCppSummarizer backend with automatic GGUF model download
- Wire download progress into PipelineProgress TUI for all code paths
  (run_transcribe, run_summarize, run_resume, run_warmup)
- Suppress stderr noise (ggml_metal_init) during model loading
- Update default config from ollama/mistral to local/phi-4-mini
- Add huggingface-hub and llama-cpp-python dependencies
- Fix tests to mock _ensure_model so they don't trigger real downloads

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 24, 2026 22:02
@paberr paberr added the enhancement New feature or request label Feb 24, 2026
@paberr paberr self-assigned this Feb 24, 2026
Allow users to specify any HuggingFace-hosted GGUF model in their
config using the hf:owner/repo/filename.gguf syntax, which
auto-downloads via hf_hub_download. This avoids requiring manual
downloads for models not in the built-in registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the default Ollama-based summarization backend with a built-in local LLM using llama-cpp-python and Phi-4-mini (~2.4 GB GGUF model). Summarization now works out of the box without requiring external services like Ollama or LM Studio. The PR adds automatic model downloading with progress tracking integrated into the existing TUI, updates the default configuration, and fixes tests to prevent real downloads during test runs.

Changes:

  • Implements LlamaCppSummarizer backend with automatic GGUF model download from HuggingFace
  • Integrates download progress into PipelineProgress TUI across all code paths (run_transcribe, run_summarize, run_warmup)
  • Changes default summarization backend from "ollama" to "local" and model from "mistral" to "phi-4-mini"

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/ownscribe/summarization/llama_cpp_summarizer.py New summarizer backend using llama-cpp-python with model registry, download logic, and stderr suppression
src/ownscribe/summarization/init.py Updated factory to create LlamaCppSummarizer for "local" backend
src/ownscribe/pipeline.py Added _download_summarization_model helper; integrated model downloading into run_warmup, run_summarize, and _do_transcribe_and_summarize
src/ownscribe/progress.py Added download_summarizer parameter to PipelineProgress for model download progress tracking
src/ownscribe/config.py Changed default backend to "local" and default model to "phi-4-mini"
tests/test_pipeline.py Added mocks for _ensure_model to prevent real downloads; added tests for warmup download behavior
tests/test_config.py Updated test expectation for new default backend
pyproject.toml Added llama-cpp-python and huggingface-hub dependencies
uv.lock Locked versions for new dependencies
README.md Updated documentation to reflect built-in model; removed Ollama requirement from prerequisites
CONTRIBUTING.md Updated LLM backends description to mention built-in local model
AGENTS.md Updated architecture notes to include LlamaCppSummarizer and lazy import of llama_cpp

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Make _suppress_stderr robust with try/except fallback to no-op
- Add error handling for model download in run_summarize
- Support json_schema in chat() with fallback chain
- Fix is_available() to check llama_cpp import
- Add comprehensive unit tests for LlamaCppSummarizer and _ensure_model

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@paberr paberr merged commit ec9a05c into main Feb 24, 2026
2 checks passed
@paberr paberr deleted the add-local-llm-summarization branch February 24, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants