Conversation
…lub#29) `swamp datastore sync --push` against an @swamp/s3-datastore fails fatally on `_catalog.db-wal` because the SQLite data catalog lives inside the sync cache directory and the walker enqueues all three `_catalog.db*` files for upload. SQLite rewrites the WAL mid-upload with PRAGMA journal_mode=WAL, and the S3 PUT fails. Fix in two layers: 1. Add an `isInternalCacheFile` helper that filters the three pre-existing internal files PLUS basenames `_catalog.db` and anything starting with `_catalog.db-` (matching the SQLite WAL, SHM, and journal sidecars). The helper uses basename matching so it's robust to future changes in the data tier subdirectory. 2. Add passive index cleanup so polluted remote indexes from the bug period heal automatically. A `scrubIndex` method removes zombie `_catalog.db*` entries from the in-memory index on load, and an `indexMutated` flag propagates the cleaned index back to the remote on the next push (even a no-op push). The scrub runs inside `loadIndex` and both branches of `pullIndex` so the persistence boundary enforces the invariant. On the cache-miss branch of `pullIndex` the local cache file is rewritten with the scrubbed JSON for on-disk/in-memory consistency. Applied identically to `@swamp/s3-datastore` and `@swamp/gcs-datastore` since both carry the same defect — the hardcoded three-filename skip list at `s3_cache_sync.ts:252` and `gcs_cache_sync.ts:221`. Seven new unit tests per extension cover: push skips catalog files, push still skips pre-existing internal files, walker-level skip is a true safety net via post-scrub zombie injection, pull skips zombies from remote index, pullIndex S3/GCS-fetch path scrubs and rewrites local cache file, pullIndex cache-hit path scrubs and indexMutated propagates via next push, no-op push propagates scrubbed index then flag resets on second call. Out of scope (tracked separately): - swamp-club#30: `sync --push` calls `pushChanged()` twice per invocation via the coordinator flush path. - swamp-club#31: orphan physical `_catalog.db*` objects remaining in pre-fix buckets. Acceptable trade-off — release notes can direct users to `aws s3 rm` / `gsutil rm` if they want to reclaim the few KB of dead weight. - systeminit/swamp-uat#131: CLI-level UAT coverage for the populated-catalog push scenario. Manifest versions bumped to 2026.04.08.1 for CI auto-publish. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
Blocking Issues
None.
Suggestions
-
Test (c) doesn't isolate the walker's belt-and-suspenders guard. In both
s3_cache_sync_test.tsandgcs_cache_sync_test.ts, test (c) seeds a clean remote index, callspullIndex()(which setsthis.indexnon-null), then injects a zombie entry into the in-memory index, and finally callspullChanged(). However,pullChanged()begins by callingpullIndex()again — and becausethis.index !== null, the cache-hit branch is skipped and the S3/GCS-fetch branch runs, re-fetching the clean remote index and overwriting the injected zombie before the walker loop runs. So the assertion passes becausepullIndex()erased the zombie via a fresh remote fetch, not because the walker'sisInternalCacheFilecheck fired. The production guard is real and correct — this is purely a test isolation issue. To truly exercise the walker guard in isolation, the zombie would need to be injected afterpullIndex()is called internally (or viathis.indexset to non-null through an initial push path). Low priority, the overall behavior is verified. -
The
seedFilehelper in both test files (line ~264 in gcs test, ~938 in s3 test) uses a manualsubstring/lastIndexOfto derive the parent directory.dirname(full)from the already-imported@std/pathwould be cleaner, but this is minor test helper code.
The fix itself is well-designed and correct: isInternalCacheFile cleanly consolidates the skip logic, scrubIndex() + indexMutated correctly handles the migration path for zombie entries in existing remote indexes, and both layers are applied symmetrically to S3 and GCS. Test coverage for the six behavioral scenarios is thorough.
There was a problem hiding this comment.
Adversarial Review
I read all six changed files (both sync implementations, both test files, both manifests). I traced every new code path, the ||= short-circuit semantics, the scrub-during-iteration safety, error-path consistency, and concurrent batch mutation safety. The production code is solid.
Medium
-
Test (c) does not actually exercise the walker-level skip it claims to test —
s3_cache_sync_test.ts:204-251,gcs_cache_sync_test.ts:195-232.The test calls
service.pullIndex()(populatesthis.index), injects a zombie, then callsservice.pullChanged(). ButpullChangedcallspullIndex()again internally. Sincethis.index !== null, the cache-hit branch conditionthis.index === nullisfalse, so it falls through to the S3/GCS-fetch branch. The fetch overwritesthis.indexwith the clean remote index, erasing the injected zombie before the walker loop ever runs. The assertionmock.gets.includes("data/_catalog.db-wal") === falsepasses because re-fetch removed the zombie, not because the walker'sisInternalCacheFilecheck filtered it.Breaking example: If you removed the
isInternalCacheFile(rel)check from thepullChangedwalker loop (lines 241-243 in S3, lines 225-227 in GCS), test (c) would still pass — proving it doesn't test what its name says.Suggested fix: To truly test the walker skip, prevent
pullIndexfrom re-fetching insidepullChanged. Either: (a) inject the zombie into the mock remote index so it survives the re-fetch (and won't be scrubbed since it's a fresh parse... wait,scrubIndexwould catch it). Or (b) seed the zombie in the local cache file with a fresh mtime sopullIndextakes the cache-hit path. To hit the cache-hit path,this.indexmust benull— so you'd need to null it out afterpullIndex, then inject the zombie into the local file, then callpullChanged. The simplest approach: skip the initialpullIndexentirely, seed a local index file containing the zombie with a fresh mtime, construct a fresh service instance, and callpullChanged— the cache-hit path inpullIndexwill fire (fresh file +this.index === null), scrub will remove the zombie, but you can re-inject it post-scrub by using the private-state cast before the walker loop. However, sincepullChangedcallspullIndexatomically before iterating, you can't inject between them without modifying the class. The pragmatic fix: accept this as a documentation issue and rename the test to clarify what it actually verifies (scrub + re-fetch cleanup).Impact: No production risk — the walker-level skip works correctly. This is a test fidelity issue only.
Low
-
||=short-circuit skipsscrubIndex()whenindexMutatedis alreadytrue—s3_cache_sync.ts:184,444,gcs_cache_sync.ts:170,390.this.indexMutated ||= this.scrubIndex()will not callscrubIndex()ifindexMutatedis alreadytrue. In theory, if the index is reloaded from a polluted source afterindexMutatedwas set but before it was reset, zombies could survive in-memory. In practice this cannot happen: the||=sites only execute whenthis.indexwas previouslynull(guarded bythis.index === nullinpullIndexorif (this.index) returninloadIndex), andindexMutatedstarts asfalseand is onlytrueafter a prior scrub — by which pointthis.indexis already set and these paths are unreachable. No actual bug, but the||=is semantically surprising; a plainif (this.scrubIndex()) this.indexMutated = truewould be clearer and avoid the short-circuit footgun for future editors. Contrast with the S3-fetch branch inpullIndexwhich correctly uses direct assignment. -
The S3
pullChangedreconcileslocalMtimeon size-match (lines 252-257) but GCSpullChangeddoes not (lines 230-233) — this is a pre-existing divergence, not introduced by this PR. Noting it since the PR description says the implementations are "literally identical code."
Verdict
PASS — The core fix is correct and well-tested. isInternalCacheFile covers all SQLite sidecar patterns, scrubIndex correctly heals polluted indexes, the indexMutated flag propagates cleanup to the remote, and error paths don't leave inconsistent state. The one medium finding is a test fidelity issue with no production impact.
Summary
Fix for swamp-club issue #29:
swamp datastore sync --pushagainst@swamp/s3-datastorefails fatally on_catalog.db-walbecause the local SQLite data catalog lives inside the sync cache directory and the walker unconditionally enqueues all three_catalog.db*files for upload. Between the walker'sstatand the S3 PUT, SQLite rewrites the WAL underPRAGMA journal_mode=WAL, and the upload fails mid-flight.Fix
Two layers, both in
s3_cache_sync.tsand mirrored ingcs_cache_sync.ts:Layer 1 — skip filter. Add an
isInternalCacheFile(rel)helper that returns true for the three pre-existing internal files (.datastore-index.json,.push-queue.json,.datastore.lock) plus basenames_catalog.dband anything starting with_catalog.db-(matching the SQLite WAL, SHM, and journal sidecars). Applied to the walker inpushChangedand to the remote-index iteration loop inpullChangedas a belt-and-suspenders safety net. Basename matching (not full-path equality) makes the filter robust to any future change in the data tier subdirectory name.Layer 2 — passive index cleanup. Users who hit the bug before this fix have
_catalog.db*entries sitting in their remote.datastore-index.json. Without active cleanup those zombie entries would keep getting re-uploaded on every push forever. A new privatescrubIndex()method removes them from the in-memory index, and aprivate indexMutatedflag propagates the cleaned index back to the remote on the next push — even a no-op push (pushed === 0). The scrub runs insideloadIndex()and both branches ofpullIndex()so the persistence boundary enforces the invariant for any future caller. On the cache-miss branch ofpullIndexthe local cache file is rewritten with the scrubbed JSON so on-disk and in-memory views stay consistent; the cache-hit branch scrubs in-memory only and relies onindexMutatedpropagation on the next push.Why GCS is included in this PR
gcs_cache_sync.tshas the identical three-filename hardcoded skip list at line 221 and the identicalpullChanged/pushChanged/loadIndex/pullIndexstructure. Shipping without the GCS fix would leave a guaranteed defect in a sibling extension. The port is literally identical code — no added risk, no scope creep. The no-shared-internals pattern across extensions means the helper is duplicated verbatim rather than extracted.Test coverage
Seven new unit tests per extension (
s3_cache_sync_test.tsandgcs_cache_sync_test.ts) covering:pushChangedskips_catalog.db,_catalog.db-shm,_catalog.db-walfrom the walkpushChangedstill skips the three pre-existing internal files (regression guard)pullChangedis a true safety net — manually injects a zombie intothis.index.entriespost-scrub via a private-state cast, asserts the walker still skips itpullChangedskips zombie_catalog.db*entries from the remote indexpullIndexS3-fetch path scrubs in-memory AND rewrites the local cache file with the scrubbed JSONpullIndexcache-hit path scrubs in-memory only, then the nextpushChangedpropagates the cleaned index to the remote viaindexMutatedpushChangedwith a polluted on-disk index propagates the scrubbed index to the remote, then the flag resets on a second call (verifies reset semantics)Tests use an in-memory mock client with append-only
putObjectrecording, cast viaas unknown as S3Client— the same escape-hatch pattern already used ins3_lock_test.ts.Tests use a private-state cast (
as unknown as { index, indexMutated }) to observe internal state. Accepted trade-off vs. adding a test-only public getter.Verification: 24 tests pass in S3 extension (7 new + 17 existing), 24 tests pass in GCS extension (7 new + 17 existing). Full
deno task check && deno task lint && deno task fmt && deno task testclean in bothdatastore/s3/anddatastore/gcs/. Before/after verification against the reproduction script from triage confirms the fix: pre-fix walker queues_catalog.db,_catalog.db-shm,_catalog.db-walalongside the legitimate payload; post-fix walker queues only the legitimate payload.What this PR does NOT do
pushChanged()call inswamp datastore sync --push. Tracked in swamp-club#30. Separate risk profile — getting push lifecycle signalling wrong could cause silent data loss, warrants its own focused PR._catalog.db*objects in existing buckets. Tracked in swamp-club#31. Accepted trade-off: the index cleanup removes zombie references, but the physical objects remain as unreferenced dead weight. Affected users can manually delete them withaws s3 rm --recursive 's3://<bucket>/data/_catalog.db'(or the gsutil equivalent for GCS). Few KB of storage; no functional impact.Release notes
Manifest bumps
@swamp/s3-datastore:2026.04.03.1→2026.04.08.1@swamp/gcs-datastore:2026.03.31.1→2026.04.08.1CI auto-publishes both extensions when this lands on main.
Test plan
deno task checkpasses indatastore/s3/anddatastore/gcs/deno task lintpasses in bothdeno task fmtis clean in bothdeno task test— 24 tests pass in each🤖 Generated with Claude Code