feat: base provider abstraction + Claude adapter (phase 1)#242
Conversation
- BaseProvider ABC + SessionInfo dataclass for multi-provider support - Factory helpers: make_user_entry, make_assistant_entry, make_thinking_entry, make_tool_use_entry, make_tool_result_entry, extract_text, file_mtime_iso - ClaudeProvider wrapping existing claude-code-log functionality - ProviderRegistry with auto-discovery and lazy instantiation - discover_providers() entry point for provider discovery - discover_all_sessions() and discover_sessions_by_provider() for unified access - 0 Pyright errors, 2207 tests pass - Docs symlinks preserved (not replaced with copies)
|
@cboos You make a good point that we're landing a lot. I've been chewing away at making this less POC and more feature-hub so we can easily add more providers as time goes on, if wanted. I'm updating this PR as the main hub, then will submit AGY as a separate. Once you're happy with those, I can refine the Codex, OpenCode, etc other providers. What's included in this PR:
All existing tests pass (2207 on this branch). No behavior change — Edited to add: This is my first substantive PR to a project. Thanks for your patience as I keep learning the ropes. |
📝 WalkthroughWalkthroughAdds shared provider abstractions and session metadata, a Claude provider implementation, a registry for provider discovery and loading, and public discovery helpers plus package exports. ChangesProvider discovery and registry
Sequence Diagram(s)sequenceDiagram
participant Caller
participant discover_all_sessions
participant discover_providers
participant ProviderRegistry
participant ClaudeProvider
Caller->>discover_all_sessions: request session enumeration
discover_all_sessions->>discover_providers: build registry
discover_providers->>ProviderRegistry: register ClaudeProvider
discover_providers->>ProviderRegistry: instantiate_registered()
discover_all_sessions->>ProviderRegistry: get_available_providers()
ProviderRegistry->>ClaudeProvider: discover_sessions()
ClaudeProvider-->>ProviderRegistry: SessionInfo values
ProviderRegistry-->>discover_all_sessions: SessionInfo values
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Warning |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
claude_code_log/providers/registry.py (2)
68-70: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd an explicit return type annotation for consistency.
Every other method on
ProviderRegistryis annotated;load_sessionreturnsprovider.load_session(...), i.e.Iterator[TranscriptEntry]. Annotating it keeps the public surface consistent and explicit.Proposed change
def load_session( self, provider_name: str, session_id: str, max_messages: Optional[int] = None - ): + ) -> Iterator[TranscriptEntry]:(requires importing
TranscriptEntryandfrom .base import ...)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/providers/registry.py` around lines 68 - 70, Add an explicit return type annotation to ProviderRegistry.load_session for consistency with the rest of the class. Update the load_session method signature in ProviderRegistry to declare the Iterator[TranscriptEntry] return type, and make sure TranscriptEntry is imported alongside the existing base provider symbols so the public API stays explicit and consistent.
29-36: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winSilently swallowing provider init failures hampers diagnosability.
A bare
except Exception: passmeans a misconfigured or broken provider disappears with no trace, making field issues very hard to debug. At minimum log the failure (with provider name) at warning/debug level.Proposed change
def instantiate_registered(self) -> None: - for provider_class in self._provider_classes.values(): + for name, provider_class in self._provider_classes.items(): try: provider = provider_class() self.register(provider) - except Exception: - # Skip providers that fail to initialize - pass + except Exception: + logger.warning( + "Skipping provider %r: failed to initialize", name, exc_info=True + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/providers/registry.py` around lines 29 - 36, The instantiate_registered method in the registry currently swallows provider initialization failures with a bare except, which hides broken providers; update this path to log the exception at warning or debug level and include the provider class name before skipping it. Use the existing instantiate_registered and register flow in ProviderRegistry to identify the failing provider, and keep the fallback behavior of continuing with the remaining providers.claude_code_log/providers/base.py (1)
49-50: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value
file_mtime_isoemits a timezone-naive local timestamp.
datetime.fromtimestamp(...)without atzargument produces a naive local-time value, socreated_at/updated_atstrings lack an offset. Once sessions from multiple providers (or machines in different zones) are sorted/compared, this is ambiguous. Consider anchoring to UTC.Proposed change
-from datetime import datetime +from datetime import datetime, timezone @@ def file_mtime_iso(path: Path) -> str: - return datetime.fromtimestamp(path.stat().st_mtime).isoformat() + return datetime.fromtimestamp(path.stat().st_mtime, tz=timezone.utc).isoformat()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/providers/base.py` around lines 49 - 50, The file_mtime_iso helper currently returns a timezone-naive local timestamp, so the created_at/updated_at values are ambiguous across machines and providers. Update file_mtime_iso in base.py to produce a timezone-aware UTC ISO string by anchoring datetime.fromtimestamp to UTC, and keep the change localized to this helper so callers continue using file_mtime_iso unchanged.claude_code_log/discovery.py (3)
45-55: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd a docstring to
get_session_stats.Every other public function in this module has a docstring;
get_session_statsis the only one without one. Document the return shape ({provider_name: session_count}) for consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/discovery.py` around lines 45 - 55, Add a docstring to get_session_stats in discovery.py so it matches the other public functions in the module. Describe that it returns a dict mapping provider names to session counts, and note the return shape as {provider_name: session_count} for consistency.
9-29: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider delegating to the registry to avoid duplicated discovery logic.
discover_all_sessions(no filter) andget_session_statsreimplement the iterate-available-providers / yield-from-discover_sessionsflow thatProviderRegistryalready exposes (discover_all_sessions,discover_sessions_by_provider). The provider-name filter here is the only added value; the rest could route through the registry to keep a single source of truth for availability and iteration semantics.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/discovery.py` around lines 9 - 29, discover_all_sessions currently duplicates the provider iteration and availability checks that ProviderRegistry already owns. Update discover_all_sessions to delegate to the registry’s discover_all_sessions and discover_sessions_by_provider helpers, keeping only the provider-name filter logic here. Use discover_providers, ProviderRegistry, and discover_sessions_by_provider to centralize availability and iteration semantics in one place.
58-69: 📐 Maintainability & Code Quality | 🔵 TrivialAdd a return annotation to
load_session.
claude_code_log/discovery.py:58should mirror the provider API with-> Iterator[TranscriptEntry];TranscriptEntryis already available from.providers.base, so the wrapper can expose the same contract explicitly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@claude_code_log/discovery.py` around lines 58 - 69, Add an explicit return type to load_session in discovery.py so the wrapper matches the provider API; update the function signature to return Iterator[TranscriptEntry], using the already imported TranscriptEntry from .providers.base, and keep the body delegating through discover_providers().load_session(provider_name, session_id).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@claude_code_log/providers/claude.py`:
- Around line 40-56: The load_session method in
claude_code_log/providers/claude.py ignores the max_messages parameter and
always returns the full transcript from load_transcript(jsonl_file). Update
load_session to enforce the cap by passing max_messages through to the
transcript loading path if supported, or by truncating iteration after
max_messages entries before returning. Keep the change localized to load_session
and any directly related transcript-loading helper it calls.
---
Nitpick comments:
In `@claude_code_log/discovery.py`:
- Around line 45-55: Add a docstring to get_session_stats in discovery.py so it
matches the other public functions in the module. Describe that it returns a
dict mapping provider names to session counts, and note the return shape as
{provider_name: session_count} for consistency.
- Around line 9-29: discover_all_sessions currently duplicates the provider
iteration and availability checks that ProviderRegistry already owns. Update
discover_all_sessions to delegate to the registry’s discover_all_sessions and
discover_sessions_by_provider helpers, keeping only the provider-name filter
logic here. Use discover_providers, ProviderRegistry, and
discover_sessions_by_provider to centralize availability and iteration semantics
in one place.
- Around line 58-69: Add an explicit return type to load_session in discovery.py
so the wrapper matches the provider API; update the function signature to return
Iterator[TranscriptEntry], using the already imported TranscriptEntry from
.providers.base, and keep the body delegating through
discover_providers().load_session(provider_name, session_id).
In `@claude_code_log/providers/base.py`:
- Around line 49-50: The file_mtime_iso helper currently returns a
timezone-naive local timestamp, so the created_at/updated_at values are
ambiguous across machines and providers. Update file_mtime_iso in base.py to
produce a timezone-aware UTC ISO string by anchoring datetime.fromtimestamp to
UTC, and keep the change localized to this helper so callers continue using
file_mtime_iso unchanged.
In `@claude_code_log/providers/registry.py`:
- Around line 68-70: Add an explicit return type annotation to
ProviderRegistry.load_session for consistency with the rest of the class. Update
the load_session method signature in ProviderRegistry to declare the
Iterator[TranscriptEntry] return type, and make sure TranscriptEntry is imported
alongside the existing base provider symbols so the public API stays explicit
and consistent.
- Around line 29-36: The instantiate_registered method in the registry currently
swallows provider initialization failures with a bare except, which hides broken
providers; update this path to log the exception at warning or debug level and
include the provider class name before skipping it. Use the existing
instantiate_registered and register flow in ProviderRegistry to identify the
failing provider, and keep the fallback behavior of continuing with the
remaining providers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bed43211-85e1-4f3e-ab40-9fe78af5e58e
📒 Files selected for processing (5)
claude_code_log/discovery.pyclaude_code_log/providers/__init__.pyclaude_code_log/providers/base.pyclaude_code_log/providers/claude.pyclaude_code_log/providers/registry.py
| def load_session( | ||
| self, session_id: str, max_messages: Optional[int] = None | ||
| ) -> Iterator[TranscriptEntry]: | ||
| from claude_code_log.converter import load_transcript | ||
|
|
||
| data_dir = self.get_data_dir() | ||
| if data_dir is None: | ||
| raise ValueError("Claude data directory not found") | ||
|
|
||
| for project_dir in data_dir.iterdir(): | ||
| if not project_dir.is_dir(): | ||
| continue | ||
| jsonl_file = project_dir / f"{session_id}.jsonl" | ||
| if jsonl_file.exists(): | ||
| return iter(load_transcript(jsonl_file)) | ||
|
|
||
| raise FileNotFoundError(f"Session {session_id} not found") |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect load_transcript signature/return type
fd -t f 'converter.py' claude_code_log
ast-grep run --pattern 'def load_transcript($$$):
$$$' --lang python $(fd -t f 'converter.py' claude_code_log)Repository: daaain/claude-code-log
Length of output: 189
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== outline converter.py =="
ast-grep outline claude_code_log/converter.py --view expanded || true
echo
echo "== search load_transcript / max_messages usages =="
rg -n "load_transcript|max_messages" claude_code_logRepository: daaain/claude-code-log
Length of output: 4261
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== read converter.py relevant section =="
wc -l claude_code_log/converter.py
sed -n '1,260p' claude_code_log/converter.py
echo
echo "== read provider/registry relevant sections =="
rg -n "def load_session|max_messages|load_transcript" claude_code_log/providers claude_code_log | sed -n '1,200p'Repository: daaain/claude-code-log
Length of output: 12573
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,260p' claude_code_log/converter.py
echo
sed -n '1,220p' claude_code_log/providers/claude.py
echo
rg -n "max_messages|load_session\\(" claude_code_logRepository: daaain/claude-code-log
Length of output: 12196
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,260p' claude_code_log/converter.py
echo
sed -n '1,220p' claude_code_log/providers/claude.py
echo
rg -n "max_messages|load_session\\(" claude_code_logRepository: daaain/claude-code-log
Length of output: 12196
max_messages needs to be enforced in claude_code_log/providers/claude.py::load_session
max_messages is accepted and forwarded by the registry, but this method always returns load_transcript(jsonl_file) in full. That leaves large sessions unbounded and makes the cap parameter a no-op. Either pass the limit through to the transcript loader or stop reading once the cap is reached.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@claude_code_log/providers/claude.py` around lines 40 - 56, The load_session
method in claude_code_log/providers/claude.py ignores the max_messages parameter
and always returns the full transcript from load_transcript(jsonl_file). Update
load_session to enforce the cap by passing max_messages through to the
transcript loading path if supported, or by truncating iteration after
max_messages entries before returning. Keep the change localized to load_session
and any directly related transcript-loading helper it calls.
|
Thanks for taking up the challenge. However, the abstraction alone might not be the best choice as the first step, since we can't be sure we've nailed it perfectly without at least one example to validate it. In particular, I'm a bit concerned about the The suggestion in #225 was indeed: land the abstraction plus one fully-realized provider — properly typed, threading entries, and modeling tool calls/results — as the quality bar. So, I think it's nice to separate the addition of the providers layer as a single commit as you did already, but on top of that, in this PR, you could just add the Never mind, I just noticed #243... |
|
(Claude) Reviewed as the base for #243 — clean, merge-ready foundation. It's purely additive (5 new files, zero changes to existing code), so the Claude path is untouched and the docs symlinks are preserved (addressing the symlink breakage from #225). |
Replaces #225 — this is the clean base abstraction, split out per maintainer feedback.
What's included
BaseProviderABC +SessionInfodataclass — the contract all providers implementProviderRegistrywith auto-discovery and instantiationClaudeProviderwrapping existing~/.claude/functionality (behavioral no-op)discovery.py) — single entry point for all providersWhat's not included (coming as separate PRs)
--providerCLI flag wiring--all-providerspath filtering compositionVerification
All existing tests pass (2207 on this branch, up from 2138). No behavior change —
ClaudeProviderdelegates entirely to the existingconverter.py+renderer.pypipeline.Summary by CodeRabbit