fix: batch upserts in miner to prevent ChromaDB 1.5.x compaction crashes by IzmanIzy · Pull Request #796 · MemPalace/mempalace

IzmanIzy · 2026-04-13T11:43:47Z

Summary

Batch all chunks per file into a single collection.upsert() call instead of upserting each chunk individually. This reduces WAL write pressure that causes the Rust compactor in ChromaDB >= 1.5 to crash.
Add periodic checkpoints (every 200 files) that release and re-acquire the collection, giving the compactor time to flush background work.
Release collection reference at the end of mining for a clean shutdown.

Problem

When mining projects with 100+ files, the miner issues thousands of individual upserts. On ChromaDB 1.5.x this causes:

Segfault (exit code 139) — the Rust compactor corrupts the metadata segment during concurrent individual writes
InternalError: Failed to apply logs to the metadata segment — WAL entries accumulate faster than compaction can process them

Both errors are intermittent and depend on project size, making them hard to reproduce in small test suites but consistent on real-world knowledge bases (300+ files).

Root Cause

ChromaDB's Rust compactor (introduced in 1.5.x) runs in a background thread. Individual upserts create one WAL entry each, and 2000+ entries in rapid succession overwhelm the compactor's ability to merge them atomically. The previous code already had a comment about hnswlib's thread-unsafe updatePoint path causing segfaults on macOS ARM — this is the same class of bug on the compaction side, now affecting all platforms.

Testing

37 existing miner tests pass (pytest tests/ -k "mine" — 37 passed, 0 failed)
Verified on a real 406-file knowledge base (3026 drawers) with ChromaDB 1.5.7 — zero crashes, clean completion with two checkpoint flushes at file 200 and 400
Lint (ruff check) and format (ruff format --check) pass
No changes to public API or CLI interface

Test plan

ruff check . passes
ruff format --check . passes
pytest tests/ -v --ignore=tests/benchmarks -k "mine" — 37 passed
Manual test: mine 406-file project with ChromaDB 1.5.7 — 3026 drawers, 0 crashes
Verify re-mining (modified files) still works correctly
Test with ChromaDB 0.6.x to confirm backward compatibility

🤖 Generated with Claude Code

The project miner previously upserted each chunk individually, which causes excessive WAL turnover in ChromaDB >= 1.5. The Rust compactor cannot keep up with the write rate, leading to either: - `InternalError: Failed to apply logs to the metadata segment` - Segfault (SIGSEGV, exit code 139) during background compaction This change batches all chunks from a single file into one `upsert()` call and introduces periodic checkpoints (every 200 files) that release and re-acquire the collection, giving the compactor time to flush. Tested on a 406-file knowledge base (3026 drawers) with ChromaDB 1.5.7 — zero crashes, clean completion with two checkpoint flushes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jphein · 2026-04-13T14:35:21Z

This addresses the same WAL pressure issue we tackled in #629 — worth noting that PR also adds bulk mtime pre-fetch (bulk_check_mined()) and optional concurrent mining via --workers. The single-file batch approach here is cleaner for a targeted fix though. Might be worth coordinating so the two PRs don't conflict — happy to rebase #629 on top of this if it merges first.

IzmanIzy · 2026-04-13T15:47:43Z

This addresses the same WAL pressure issue we tackled in #629 — worth noting that PR also adds bulk mtime pre-fetch (bulk_check_mined()) and optional concurrent mining via --workers. The single-file batch approach here is cleaner for a targeted fix though. Might be worth coordinating so the two PRs don't conflict — happy to rebase #629 on top of this if it merges first.

IzmanIzy requested review from bensig and milla-jovovich as code owners April 13, 2026 11:43

igorls added area/mining File and conversation mining bug Something isn't working labels Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: batch upserts in miner to prevent ChromaDB 1.5.x compaction crashes#796

fix: batch upserts in miner to prevent ChromaDB 1.5.x compaction crashes#796
IzmanIzy wants to merge 1 commit intoMemPalace:mainfrom
IzmanIzy:fix/batch-upsert-chromadb-compaction

IzmanIzy commented Apr 13, 2026

Uh oh!

jphein commented Apr 13, 2026

Uh oh!

IzmanIzy commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

IzmanIzy commented Apr 13, 2026

Summary

Problem

Root Cause

Testing

Test plan

Uh oh!

jphein commented Apr 13, 2026

Uh oh!

IzmanIzy commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants