Speed up the SQLite metadata-cache build (connection reuse + synchronous=NORMAL) by cboos · Pull Request #251 · daaain/claude-code-log

cboos · 2026-06-29T20:35:39Z

Part of #248.

Problem

Building the SQLite metadata cache dominates per-project processing.
Instrumenting a real (contention-immune) build showed the cost is
connection churn, not fsync: every CacheManager method (reads
included) opened its own sqlite3.connect + PRAGMAs — ~189 connection
opens per single-project build (~3.9ms each), against only 47 commits.
(An earlier "fsync is the bottleneck" reading turned out to be a
measurement artifact from toggling use_cache under load.)

Change

PRAGMA synchronous=NORMAL on cache connections (WAL). The cache
is regenerable, so the durability trade-off — only an OS/power crash
can lose the last commit, not an app crash — is appropriate.
CacheManager.batch() — an opt-in context that reuses one
connection for the duration of a build instead of opening ~189. The
converter wraps each per-project build in it. Outside a batch,
behaviour is byte-for-byte unchanged (open-per-call), preserving the
Windows file-lock semantics the per-call design relied on.
The converter changes are a pure reindent (control flow preserved;
nullcontext keeps the no-cache path identical; nested batches are a
no-op reuse).

Why the open-per-call default is preserved

A persistent connection holds .db/.db-wal/.db-shm open, which on
Windows breaks TemporaryDirectory cleanup and clear_cache rmtree
(WinError 32). batch() therefore closes the shared connection at
scope end (and on exception) before any teardown can run. This is
pinned by tests in test_cache_sqlite_integrity.py, not just asserted:

reuse within a batch; nested batch is a no-op reuse; only the outermost
closes
the shared connection is closed even if the build raises (no stale
connection leaks the lock)
rmtree-after-batch succeeds on Windows
the default path opens a fresh connection per call

Results (local Windows)

connection opens: 189 → 14 (93%) for a single-project build
clean sample convert (min-of-6): 5.44s → 4.10s (~25%)

Cache invalidation (mtime), schema-version rebuild, and concurrent-reader
safety are unchanged and covered by the integrity tests and the existing
cache suite. The connection model is documented in
dev-docs/application_model.md.

Unlike the test-only companions in #248, this is a change to production
code that speeds up real cache builds on every platform (Windows most,
since connection/file overhead is heaviest there).

Summary by CodeRabbit

New Features
- Added batched cache operations for faster work across larger cache-building flows.
- Improved connection handling so repeated cache actions can reuse one shared database connection within a batch.
Bug Fixes
- Reduced the risk of leftover database locks after cache updates.
- Ensured cache connections are closed reliably, even when errors occur.
- Kept cache settings consistent for better SQLite stability and durability.
Tests
- Added coverage for connection reuse, cleanup behavior, nested batches, and persisted writes.

Paired with the existing WAL journal mode, synchronous=NORMAL skips an fsync on every commit while preserving durability across application crashes (only a power/OS crash can lose the last committed transaction). The SQLite cache is fully regenerable from the JSONL source, so that residual risk is acceptable. Measured: ~6x faster on a commit-bound microbenchmark (300 commits: 363ms -> 57ms). NOTE the real-build impact is modest: a full single-project build issues only ~47 commits (~169ms total fsync), so this saves ~145ms. The dominant cache overhead is connection churn (~189 sqlite3.connect calls per build), addressed separately. Adds a test pinning synchronous=NORMAL alongside the WAL-mode test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A full single-project build issued ~189 sqlite3.connect() calls — every CacheManager method (reads included) opened its own connection. That connection churn, not fsync, is the dominant cache-build cost (~3.9ms/open). Add an opt-in batch() context manager that opens one shared connection, has _get_connection() reuse it, and closes it on scope exit (including on exception, via finally). Wrap the converter's build hotspots in batch(): - ensure_fresh_cache: invalidation reads + populate (load + session/ aggregate writes) share one connection. - load_directory_transcripts: the per-file load loop. - _generate_individual_session_files: the per-session staleness checks, html_cache writes, and per-session renderer cache reads. Nesting is a no-op reuse (inner batch leaves the outer connection's lifecycle to the outermost scope), so these compose safely — the process-hierarchy path drives per-project builds through convert_jsonl_to and inherits the reuse. Result: connection opens for a single-project build drop ~189 -> ~14 (93%). Outside a batch the default is unchanged (connection-per-call), which keeps the Windows-safe no-lingering-handle behaviour: batch() closes the shared connection before any temp-dir teardown / clear_cache rmtree (WinError 32). Integrity tests pin reuse, normal+exception close, nesting, write-persistence, and the rmtree-after-build case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Record the as-built connection lifecycle in the cache deep-dive: connection-per-call default (Windows-safe, no lingering handle), synchronous=NORMAL durability tradeoff, and the batch() connection reuse the converter uses for build hotspots. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Harden the field ordering so _shared_conn always exists before any _get_connection() could run, even if migrations are ever routed through it. Currently safe (migrations use the standalone run_migrations), but this removes a latent AttributeError footgun. Per monk review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-29T20:37:14Z

Warning

Review limit reached

@cboos, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 17 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 26f4d5b8-1b84-46ff-bf8c-b9c637fdc916

📥 Commits

Reviewing files that changed from the base of the PR and between d487181 and 51adead.

📒 Files selected for processing (2)

claude_code_log/cache.py
test/test_cache_sqlite_integrity.py

📝 Walkthrough

Walkthrough

CacheManager gains a batch() context manager that opens one SQLite connection and reuses it across nested _get_connection() calls, with guaranteed cleanup on exit. Three converter hotspots (load_directory_transcripts, ensure_fresh_cache, _generate_individual_session_files) are wrapped in this context. Tests validate lifecycle semantics and a synchronous=NORMAL pragma; docs describe the connection model.

Batched SQLite Connection Reuse

Layer / File(s)	Summary
`_configure_connection`, `_get_connection`, and `batch()` implementation `claude_code_log/cache.py`	Adds `_shared_conn` field; extracts `_configure_connection()` for WAL/pragma setup; updates `_get_connection()` to yield the shared connection when inside a batch; adds `batch()` context manager with nesting no-op and guaranteed teardown.
Integration into converter hotspots `claude_code_log/converter.py`	Wraps the per-file loop in `load_directory_transcripts`, the invalidation+repopulation sequence in `ensure_fresh_cache`, and the per-session loop in `_generate_individual_session_files` each in `cache_manager.batch()` (falling back to `contextlib.nullcontext()` when no cache manager is present).
Tests and documentation `test/test_cache_sqlite_integrity.py`, `dev-docs/application_model.md`	Adds `test_synchronous_normal` PRAGMA assertion, a `TestBatchConnectionReuse` suite covering connection reuse, cleanup on normal/exceptional exit, write persistence, and Windows file-lock release; documents the connection model in `application_model.md`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 One connection to rule them all,
No more open-close with every call!
I batch my hops across the field,
WAL mode keeps my writes well-sealed.
The shared conn clears when batch is done—
A tidy burrow, leaks: zero, none!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main optimization: faster SQLite cache builds via connection reuse and synchronous=NORMAL.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev/cache-sqlite-batching

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@claude_code_log/cache.py`:
- Around line 346-347: The SQLite connection setup in `_get_connection()` and
the `batch()` path opens a handle before calling `_configure_connection()`, but
if configuration fails the connection is left open and can lock the DB/WAL
files. Update the setup flow around `sqlite3.connect()` and
`_configure_connection()` to use a close-on-failure path so any exception during
initialization closes the connection before re-raising.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7faf2dbd-9d8e-457c-85f0-574aeffac017

📥 Commits

Reviewing files that changed from the base of the PR and between 4bd633f and d487181.

📒 Files selected for processing (4)

claude_code_log/cache.py
claude_code_log/converter.py
dev-docs/application_model.md
test/test_cache_sqlite_integrity.py

_get_connection() and batch() opened the SQLite handle before applying pragmas, so a failure in _configure_connection() would leak the connection and could lock the .db/.db-wal/.db-shm files. Add _open_configured_connection(), which closes the handle before re-raising if setup fails, and use it in both paths. batch() now only publishes to _shared_conn after setup succeeds, so a failed configure never leaves a half-initialised handle for reuse. Add a test asserting the connection is closed (and _shared_conn stays None) when _configure_connection raises, in both paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cboos and others added 4 commits June 25, 2026 10:52

cboos mentioned this pull request Jun 29, 2026

Runtime of tests is 3x to 6x slower on Windows than it is on Linux #248

Closed

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread claude_code_log/cache.py Outdated

cboos merged commit 9cd5b7f into main Jun 30, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up the SQLite metadata-cache build (connection reuse + synchronous=NORMAL)#251

Speed up the SQLite metadata-cache build (connection reuse + synchronous=NORMAL)#251
cboos merged 5 commits into
mainfrom
dev/cache-sqlite-batching

cboos commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading

Review limit reached

Walkthrough

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cboos commented Jun 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Why the open-per-call default is preserved

Results (local Windows)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cboos commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading