Skip to content

feat: base provider abstraction + Claude adapter (phase 1)#242

Open
iansherr wants to merge 1 commit into
daaain:mainfrom
iansherr:feat/base-abstraction
Open

feat: base provider abstraction + Claude adapter (phase 1)#242
iansherr wants to merge 1 commit into
daaain:mainfrom
iansherr:feat/base-abstraction

Conversation

@iansherr

@iansherr iansherr commented Jun 25, 2026

Copy link
Copy Markdown

Replaces #225 — this is the clean base abstraction, split out per maintainer feedback.

What's included

  • Base BaseProvider ABC + SessionInfo dataclass — the contract all providers implement
  • Factory helpers for transcript entry creation (~150 lines of boilerplate eliminated)
  • ProviderRegistry with auto-discovery and instantiation
  • ClaudeProvider wrapping existing ~/.claude/ functionality (behavioral no-op)
  • Unified discovery API (discovery.py) — single entry point for all providers
  • Pyright: 0 errors, 0 warnings (strict mode)
  • docs/ symlinks preserved — no copy noise

What's not included (coming as separate PRs)

  • AGY adapter (next — with proper entry threading + dropped types addressed)
  • Codex, OpenCode, Gemini adapters
  • --provider CLI flag wiring
  • --all-providers path filtering composition

Verification

All existing tests pass (2207 on this branch, up from 2138). No behavior change — ClaudeProvider delegates entirely to the existing converter.py + renderer.py pipeline.

Summary by CodeRabbit

  • New Features
    • Added support for discovering and loading chat sessions through a unified provider system.
    • Introduced automatic session discovery across available sources, plus per-provider session counts and lookup.
    • Added support for Claude session data, including listing sessions from local project folders and opening individual transcripts.
    • Expanded the public provider interface to standardize session metadata and transcript handling.

- BaseProvider ABC + SessionInfo dataclass for multi-provider support
- Factory helpers: make_user_entry, make_assistant_entry, make_thinking_entry,
  make_tool_use_entry, make_tool_result_entry, extract_text, file_mtime_iso
- ClaudeProvider wrapping existing claude-code-log functionality
- ProviderRegistry with auto-discovery and lazy instantiation
- discover_providers() entry point for provider discovery
- discover_all_sessions() and discover_sessions_by_provider() for unified access
- 0 Pyright errors, 2207 tests pass
- Docs symlinks preserved (not replaced with copies)
@iansherr

iansherr commented Jun 25, 2026

Copy link
Copy Markdown
Author

@cboos You make a good point that we're landing a lot. I've been chewing away at making this less POC and more feature-hub so we can easily add more providers as time goes on, if wanted. I'm updating this PR as the main hub, then will submit AGY as a separate. Once you're happy with those, I can refine the Codex, OpenCode, etc other providers.

What's included in this PR:

  • Base BaseProvider ABC + SessionInfo dataclass
  • Factory helpers for transcript entry creation (~150 lines of boilerplate eliminated)
  • ProviderRegistry with auto-discovery + instantiation
  • ClaudeProvider wrapping existing ~/.claude/ functionality
  • Unified discovery API (discover.py)
  • Pyright: 0 errors, 0 warnings
  • docs/ symlinks preserved (not replaced with copies)

All existing tests pass (2207 on this branch). No behavior change — ClaudeProvider delegates entirely to the existing converter.py + renderer.py pipeline.

Edited to add: This is my first substantive PR to a project. Thanks for your patience as I keep learning the ropes.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds shared provider abstractions and session metadata, a Claude provider implementation, a registry for provider discovery and loading, and public discovery helpers plus package exports.

Changes

Provider discovery and registry

Layer / File(s) Summary
Session model and provider contract
claude_code_log/providers/base.py
SessionInfo, transcript builders, and BaseProvider define the shared session and provider contract used by the discovery flow.
Registry storage and lookup
claude_code_log/providers/registry.py
ProviderRegistry stores provider instances and classes, instantiates registered classes, and exposes provider lookup and availability queries.
Claude provider discovery and loading
claude_code_log/providers/claude.py
ClaudeProvider resolves ~/.claude/projects, discovers non-agent- JSONL sessions, and loads matching transcripts from project subfolders.
Registry session flow and bootstrap
claude_code_log/providers/registry.py
ProviderRegistry discovers sessions, counts sessions per provider, loads sessions by provider name, and discover_providers() registers ClaudeProvider.
Public discovery helpers and exports
claude_code_log/discovery.py, claude_code_log/providers/__init__.py
discover_all_sessions, discover_sessions_by_provider, get_session_stats, and load_session delegate through the registry, and the providers package re-exports the public surface.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant discover_all_sessions
  participant discover_providers
  participant ProviderRegistry
  participant ClaudeProvider

  Caller->>discover_all_sessions: request session enumeration
  discover_all_sessions->>discover_providers: build registry
  discover_providers->>ProviderRegistry: register ClaudeProvider
  discover_providers->>ProviderRegistry: instantiate_registered()
  discover_all_sessions->>ProviderRegistry: get_available_providers()
  ProviderRegistry->>ClaudeProvider: discover_sessions()
  ClaudeProvider-->>ProviderRegistry: SessionInfo values
  ProviderRegistry-->>discover_all_sessions: SessionInfo values
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A bunny hopped through logs tonight,
գտ? no—just paw prints, neat and light. 🐰
New sessions bloom in registry rows,
While Claude on moonbeam gently glows.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly reflects the main changes: a new base provider abstraction and a Claude provider adapter.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Warning

⚠️ This pull request shows signs of AI-generated slop (defensive_cruft). It has been flagged by CodeRabbit slop detection and should be reviewed carefully.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
claude_code_log/providers/registry.py (2)

68-70: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add an explicit return type annotation for consistency.

Every other method on ProviderRegistry is annotated; load_session returns provider.load_session(...), i.e. Iterator[TranscriptEntry]. Annotating it keeps the public surface consistent and explicit.

Proposed change
     def load_session(
         self, provider_name: str, session_id: str, max_messages: Optional[int] = None
-    ):
+    ) -> Iterator[TranscriptEntry]:

(requires importing TranscriptEntry and from .base import ...)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude_code_log/providers/registry.py` around lines 68 - 70, Add an explicit
return type annotation to ProviderRegistry.load_session for consistency with the
rest of the class. Update the load_session method signature in ProviderRegistry
to declare the Iterator[TranscriptEntry] return type, and make sure
TranscriptEntry is imported alongside the existing base provider symbols so the
public API stays explicit and consistent.

29-36: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Silently swallowing provider init failures hampers diagnosability.

A bare except Exception: pass means a misconfigured or broken provider disappears with no trace, making field issues very hard to debug. At minimum log the failure (with provider name) at warning/debug level.

Proposed change
     def instantiate_registered(self) -> None:
-        for provider_class in self._provider_classes.values():
+        for name, provider_class in self._provider_classes.items():
             try:
                 provider = provider_class()
                 self.register(provider)
-            except Exception:
-                # Skip providers that fail to initialize
-                pass
+            except Exception:
+                logger.warning(
+                    "Skipping provider %r: failed to initialize", name, exc_info=True
+                )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude_code_log/providers/registry.py` around lines 29 - 36, The
instantiate_registered method in the registry currently swallows provider
initialization failures with a bare except, which hides broken providers; update
this path to log the exception at warning or debug level and include the
provider class name before skipping it. Use the existing instantiate_registered
and register flow in ProviderRegistry to identify the failing provider, and keep
the fallback behavior of continuing with the remaining providers.
claude_code_log/providers/base.py (1)

49-50: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value

file_mtime_iso emits a timezone-naive local timestamp.

datetime.fromtimestamp(...) without a tz argument produces a naive local-time value, so created_at/updated_at strings lack an offset. Once sessions from multiple providers (or machines in different zones) are sorted/compared, this is ambiguous. Consider anchoring to UTC.

Proposed change
-from datetime import datetime
+from datetime import datetime, timezone
@@
 def file_mtime_iso(path: Path) -> str:
-    return datetime.fromtimestamp(path.stat().st_mtime).isoformat()
+    return datetime.fromtimestamp(path.stat().st_mtime, tz=timezone.utc).isoformat()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude_code_log/providers/base.py` around lines 49 - 50, The file_mtime_iso
helper currently returns a timezone-naive local timestamp, so the
created_at/updated_at values are ambiguous across machines and providers. Update
file_mtime_iso in base.py to produce a timezone-aware UTC ISO string by
anchoring datetime.fromtimestamp to UTC, and keep the change localized to this
helper so callers continue using file_mtime_iso unchanged.
claude_code_log/discovery.py (3)

45-55: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add a docstring to get_session_stats.

Every other public function in this module has a docstring; get_session_stats is the only one without one. Document the return shape ({provider_name: session_count}) for consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude_code_log/discovery.py` around lines 45 - 55, Add a docstring to
get_session_stats in discovery.py so it matches the other public functions in
the module. Describe that it returns a dict mapping provider names to session
counts, and note the return shape as {provider_name: session_count} for
consistency.

9-29: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider delegating to the registry to avoid duplicated discovery logic.

discover_all_sessions (no filter) and get_session_stats reimplement the iterate-available-providers / yield-from-discover_sessions flow that ProviderRegistry already exposes (discover_all_sessions, discover_sessions_by_provider). The provider-name filter here is the only added value; the rest could route through the registry to keep a single source of truth for availability and iteration semantics.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude_code_log/discovery.py` around lines 9 - 29, discover_all_sessions
currently duplicates the provider iteration and availability checks that
ProviderRegistry already owns. Update discover_all_sessions to delegate to the
registry’s discover_all_sessions and discover_sessions_by_provider helpers,
keeping only the provider-name filter logic here. Use discover_providers,
ProviderRegistry, and discover_sessions_by_provider to centralize availability
and iteration semantics in one place.

58-69: 📐 Maintainability & Code Quality | 🔵 Trivial

Add a return annotation to load_session.

claude_code_log/discovery.py:58 should mirror the provider API with -> Iterator[TranscriptEntry]; TranscriptEntry is already available from .providers.base, so the wrapper can expose the same contract explicitly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude_code_log/discovery.py` around lines 58 - 69, Add an explicit return
type to load_session in discovery.py so the wrapper matches the provider API;
update the function signature to return Iterator[TranscriptEntry], using the
already imported TranscriptEntry from .providers.base, and keep the body
delegating through discover_providers().load_session(provider_name, session_id).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@claude_code_log/providers/claude.py`:
- Around line 40-56: The load_session method in
claude_code_log/providers/claude.py ignores the max_messages parameter and
always returns the full transcript from load_transcript(jsonl_file). Update
load_session to enforce the cap by passing max_messages through to the
transcript loading path if supported, or by truncating iteration after
max_messages entries before returning. Keep the change localized to load_session
and any directly related transcript-loading helper it calls.

---

Nitpick comments:
In `@claude_code_log/discovery.py`:
- Around line 45-55: Add a docstring to get_session_stats in discovery.py so it
matches the other public functions in the module. Describe that it returns a
dict mapping provider names to session counts, and note the return shape as
{provider_name: session_count} for consistency.
- Around line 9-29: discover_all_sessions currently duplicates the provider
iteration and availability checks that ProviderRegistry already owns. Update
discover_all_sessions to delegate to the registry’s discover_all_sessions and
discover_sessions_by_provider helpers, keeping only the provider-name filter
logic here. Use discover_providers, ProviderRegistry, and
discover_sessions_by_provider to centralize availability and iteration semantics
in one place.
- Around line 58-69: Add an explicit return type to load_session in discovery.py
so the wrapper matches the provider API; update the function signature to return
Iterator[TranscriptEntry], using the already imported TranscriptEntry from
.providers.base, and keep the body delegating through
discover_providers().load_session(provider_name, session_id).

In `@claude_code_log/providers/base.py`:
- Around line 49-50: The file_mtime_iso helper currently returns a
timezone-naive local timestamp, so the created_at/updated_at values are
ambiguous across machines and providers. Update file_mtime_iso in base.py to
produce a timezone-aware UTC ISO string by anchoring datetime.fromtimestamp to
UTC, and keep the change localized to this helper so callers continue using
file_mtime_iso unchanged.

In `@claude_code_log/providers/registry.py`:
- Around line 68-70: Add an explicit return type annotation to
ProviderRegistry.load_session for consistency with the rest of the class. Update
the load_session method signature in ProviderRegistry to declare the
Iterator[TranscriptEntry] return type, and make sure TranscriptEntry is imported
alongside the existing base provider symbols so the public API stays explicit
and consistent.
- Around line 29-36: The instantiate_registered method in the registry currently
swallows provider initialization failures with a bare except, which hides broken
providers; update this path to log the exception at warning or debug level and
include the provider class name before skipping it. Use the existing
instantiate_registered and register flow in ProviderRegistry to identify the
failing provider, and keep the fallback behavior of continuing with the
remaining providers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bed43211-85e1-4f3e-ab40-9fe78af5e58e

📥 Commits

Reviewing files that changed from the base of the PR and between 4bd633f and 997da83.

📒 Files selected for processing (5)
  • claude_code_log/discovery.py
  • claude_code_log/providers/__init__.py
  • claude_code_log/providers/base.py
  • claude_code_log/providers/claude.py
  • claude_code_log/providers/registry.py

Comment on lines +40 to +56
def load_session(
self, session_id: str, max_messages: Optional[int] = None
) -> Iterator[TranscriptEntry]:
from claude_code_log.converter import load_transcript

data_dir = self.get_data_dir()
if data_dir is None:
raise ValueError("Claude data directory not found")

for project_dir in data_dir.iterdir():
if not project_dir.is_dir():
continue
jsonl_file = project_dir / f"{session_id}.jsonl"
if jsonl_file.exists():
return iter(load_transcript(jsonl_file))

raise FileNotFoundError(f"Session {session_id} not found")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect load_transcript signature/return type
fd -t f 'converter.py' claude_code_log
ast-grep run --pattern 'def load_transcript($$$):
  $$$' --lang python $(fd -t f 'converter.py' claude_code_log)

Repository: daaain/claude-code-log

Length of output: 189


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== outline converter.py =="
ast-grep outline claude_code_log/converter.py --view expanded || true

echo
echo "== search load_transcript / max_messages usages =="
rg -n "load_transcript|max_messages" claude_code_log

Repository: daaain/claude-code-log

Length of output: 4261


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== read converter.py relevant section =="
wc -l claude_code_log/converter.py
sed -n '1,260p' claude_code_log/converter.py

echo
echo "== read provider/registry relevant sections =="
rg -n "def load_session|max_messages|load_transcript" claude_code_log/providers claude_code_log | sed -n '1,200p'

Repository: daaain/claude-code-log

Length of output: 12573


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,260p' claude_code_log/converter.py
echo
sed -n '1,220p' claude_code_log/providers/claude.py
echo
rg -n "max_messages|load_session\\(" claude_code_log

Repository: daaain/claude-code-log

Length of output: 12196


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,260p' claude_code_log/converter.py
echo
sed -n '1,220p' claude_code_log/providers/claude.py
echo
rg -n "max_messages|load_session\\(" claude_code_log

Repository: daaain/claude-code-log

Length of output: 12196


max_messages needs to be enforced in claude_code_log/providers/claude.py::load_session

max_messages is accepted and forwarded by the registry, but this method always returns load_transcript(jsonl_file) in full. That leaves large sessions unbounded and makes the cap parameter a no-op. Either pass the limit through to the transcript loader or stop reading once the cap is reached.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude_code_log/providers/claude.py` around lines 40 - 56, The load_session
method in claude_code_log/providers/claude.py ignores the max_messages parameter
and always returns the full transcript from load_transcript(jsonl_file). Update
load_session to enforce the cap by passing max_messages through to the
transcript loading path if supported, or by truncating iteration after
max_messages entries before returning. Keep the change localized to load_session
and any directly related transcript-loading helper it calls.

@cboos

cboos commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Thanks for taking up the challenge. However, the abstraction alone might not be the best choice as the first step, since we can't be sure we've nailed it perfectly without at least one example to validate it. In particular, I'm a bit concerned about the parentUuid=None parts.

The suggestion in #225 was indeed: land the abstraction plus one fully-realized provider — properly typed, threading entries, and modeling tool calls/results — as the quality bar.

So, I think it's nice to separate the addition of the providers layer as a single commit as you did already, but on top of that, in this PR, you could just add the agy example in one or more additional commits, exercising the different aspects of the support, so as to fully shape the abstraction together with one instantiation.


Never mind, I just noticed #243...

@cboos

cboos commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

(Claude) Reviewed as the base for #243 — clean, merge-ready foundation. It's purely additive (5 new files, zero changes to existing code), so the Claude path is untouched and the docs symlinks are preserved (addressing the symlink breakage from #225). ClaudeProvider.load_session delegates to the existing load_transcript, so it's a true behavioral no-op. pyright 0/0/0 (the 274 from #225 are gone), ty at baseline, 2207 tests pass. Clean ABC with typed factory helpers, and the registry encapsulates instantiation (fixing the earlier reportPrivateUsage). Full review + the forward direction (structural tool rendering toward parity with Claude's log output) is on #243.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants