Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Commands

```bash
uv run pytest # run all tests
uv run pytest -v # verbose
uv run pytest -v -k test_search # run a specific test module
uv run pytest -v -k "TestRankMeetings::test_speaker_boost" # single test
uv run ruff check src/ tests/ # lint
uv run ruff format src/ tests/ # auto-format
```

## Architecture

**ownscribe** is a CLI tool for local meeting recording, transcription, and summarization. The main pipeline is: Record → Transcribe → Summarize → Output.

### Plugin systems with abstract base classes

Each stage has a base class in its subpackage and one or more implementations:

- **Audio** (`audio/base.py`): `CoreAudioRecorder` (macOS, wraps a Swift binary in `swift/`) and `SoundDeviceRecorder` (cross-platform fallback). Selected in `pipeline.py:_create_recorder()`.
- **Transcription** (`transcription/base.py`): `WhisperXTranscriber` (single impl). Data models (`Segment`, `Word`, `TranscriptResult`) live in `transcription/models.py`.
- **Summarization** (`summarization/base.py`): `OllamaSummarizer` and `OpenAISummarizer`. Factory is `summarization/__init__.py:create_summarizer()` — used by both `pipeline.py` and `search.py`.
- **Output** (`output/`): `markdown.py` and `json_output.py`, selected by `config.output.format`.

### Key modules

- **`cli.py`** — Click command group. Entry point: `ownscribe.cli:cli`. All subcommands (`ask`, `transcribe`, `summarize`, `devices`, `apps`, `config`, `cleanup`).
- **`pipeline.py`** — Orchestrates the record → transcribe → summarize flow. Creates timestamped output dirs (`~/ownscribe/YYYY-MM-DD_HHMM_slug/`).
- **`search.py`** — Two-stage LLM search over meeting notes. Stage 1 scores summaries for relevance, stage 2 synthesizes answers from full transcripts. Has keyword fallback and quote verification. Helper functions return data; only `ask()` calls `click.echo`.
- **`config.py`** — Dataclass hierarchy (`Config` → `AudioConfig`, `TranscriptionConfig`, `SummarizationConfig`, etc.). Loaded from `~/.config/ownscribe/config.toml` with env var overrides (`HF_TOKEN`, `OLLAMA_HOST`).
- **`summarization/prompts.py`** — Built-in prompt templates (meeting, lecture, brief) plus search prompts. Users can define custom templates in config TOML.

### Testing conventions

- Uses `pytest` with `pytest-httpserver` for mocking HTTP APIs (Ollama, OpenAI).
- Shared fixtures in `conftest.py`: `sample_transcript`, `diarized_transcript`, `synthetic_wav`.
- Tests use `FakeSummarizer` (in `test_search.py`) or `unittest.mock` for pipeline tests.
- Markers: `@pytest.mark.hardware` (auto-skipped in CI), `@pytest.mark.macos` (auto-skipped on non-macOS).
- When mocking the shared summarizer factory in pipeline tests, patch `ownscribe.pipeline.create_summarizer` (it's imported at module level).

## Style

- Python 3.12+. Ruff with line-length 120.
- `from __future__ import annotations` in all modules.
- Lazy imports for heavy dependencies (whisperx, ollama, openai) — imported inside functions, not at module level.
- Helper functions return data; orchestrator functions (`ask()`, `run_pipeline()`) handle all `click.echo` output.
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ All audio, transcripts, and summaries remain local.
- **Pipeline progress** — live checklist showing transcription, diarization sub-steps, and summarization progress
- **Local LLM summarization** — structured meeting notes via Ollama, LM Studio, or any OpenAI-compatible server
- **Summarization templates** — built-in presets for meetings, lectures, and quick briefs; define your own in config
- **Ask your meetings** — ask natural-language questions across all your meeting notes; uses a two-stage LLM pipeline with keyword fallback
- **One command** — just run `ownscribe`, press Ctrl+C when done, get transcript + summary

## Requirements
Expand Down Expand Up @@ -118,10 +119,27 @@ ownscribe devices # list audio devices (uses native CoreAudio w
ownscribe apps # list running apps with PIDs for use with --pid
ownscribe transcribe recording.wav # transcribe an existing audio file
ownscribe summarize transcript.md # summarize an existing transcript
ownscribe ask "question" # search your meetings with a natural-language question
ownscribe config # open config file in $EDITOR
ownscribe cleanup # remove ownscribe data from disk
```

### Searching Meeting Notes

Use `ask` to search across all your meeting notes with natural-language questions:

```bash
ownscribe ask "What did Anna say about the deadline?"
ownscribe ask "budget decisions" --since 2026-01-01
ownscribe ask "action items from last week" --limit 5
```

This runs a two-stage pipeline:
1. **Find** — sends meeting summaries to the LLM to identify which meetings are relevant
2. **Answer** — sends the full transcripts of relevant meetings to the LLM to produce an answer with quotes

If the LLM finds no relevant meetings, a keyword fallback searches summaries and transcripts directly.

## Configuration

Config is stored at `~/.config/ownscribe/config.toml`. Run `ownscribe config` to create and edit it.
Expand Down Expand Up @@ -149,6 +167,7 @@ backend = "ollama" # "ollama" or "openai"
model = "mistral"
host = "http://localhost:11434"
# template = "meeting" # "meeting", "lecture", "brief", or a custom name
# context_size = 0 # 0 = auto-detect from model; set manually for OpenAI-compatible backends

# Custom templates (optional):
# [templates.my-standup]
Expand Down
13 changes: 13 additions & 0 deletions src/ownscribe/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,19 @@ def cli(
run_pipeline(config)


@cli.command()
@click.argument("question")
@click.option("--since", default=None, help="Only search meetings after this date (YYYY-MM-DD).")
@click.option("--limit", default=None, type=int, help="Max number of recent meetings to search.")
@click.pass_context
def ask(ctx: click.Context, question: str, since: str | None, limit: int | None) -> None:
"""Ask a question across your meeting notes."""
config = ctx.obj["config"]
from ownscribe.search import ask as run_ask

run_ask(config, question, since=since, limit=limit)


@cli.command()
def devices() -> None:
"""List available audio input devices."""
Expand Down
2 changes: 2 additions & 0 deletions src/ownscribe/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
model = "mistral" # model name
host = "http://localhost:11434" # ollama: :11434, LM Studio: :1234
# template = "meeting" # built-in: "meeting", "lecture", or "brief"
# context_size = 0 # 0 = auto-detect from model; set manually for OpenAI-compatible backends

# Custom templates (optional):
# [templates.my-notes]
Expand Down Expand Up @@ -79,6 +80,7 @@ class SummarizationConfig:
model: str = "mistral"
host: str = "http://localhost:11434"
template: str = ""
context_size: int = 0


@dataclass
Expand Down
14 changes: 3 additions & 11 deletions src/ownscribe/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

from ownscribe.config import Config
from ownscribe.progress import PipelineProgress, Spinner
from ownscribe.summarization import create_summarizer

# A standard WAV file header (RIFF + fmt + data chunk header) is 44 bytes.
# Files at or below this size contain no audio frames.
Expand Down Expand Up @@ -87,15 +88,6 @@ def _create_transcriber(config: Config, progress=None):
return WhisperXTranscriber(config.transcription, diar_config, progress=progress)


def _create_summarizer(config: Config):
"""Create the appropriate summarizer based on config."""
if config.summarization.backend == "openai":
from ownscribe.summarization.openai_summarizer import OpenAISummarizer
return OpenAISummarizer(config.summarization, config.templates)
else:
from ownscribe.summarization.ollama_summarizer import OllamaSummarizer
return OllamaSummarizer(config.summarization, config.templates)


def _format_output(config: Config, transcript_result, summary_text: str | None = None) -> tuple[str, str | None]:
"""Format transcript and optional summary. Returns (transcript_str, summary_str)."""
Expand Down Expand Up @@ -233,7 +225,7 @@ def run_summarize(config: Config, transcript_file: str) -> None:
"""Summarize a transcript file."""
transcript_text = Path(transcript_file).read_text()

summarizer = _create_summarizer(config)
summarizer = create_summarizer(config)
if not summarizer.is_available():
click.echo(
f"Error: {config.summarization.backend} is not reachable at {config.summarization.host}. "
Expand Down Expand Up @@ -295,7 +287,7 @@ def _do_transcribe_and_summarize(

# 3. Summarize
if sum_enabled:
summarizer = _create_summarizer(config)
summarizer = create_summarizer(config)
if not summarizer.is_available():
click.echo(
f"\nWarning: {config.summarization.backend} is not reachable "
Expand Down
Loading