fix: data gc preview and --dry-run include version-count GC (#108)#1176
fix: data gc preview and --dry-run include version-count GC (#108)#1176
Conversation
Interactive `swamp data gc` silently skipped version-count garbage collection when no lifetime-expired data existed, and `--dry-run` under- reported the same work. Both ran correctly only via `--force` or `--json`. Add a `dryRun` option to `collectGarbage` so the repository can compute would-be counts without deleting, and expose it through a new `previewVersionGarbage` service method. Extend `DataGcPreview` with a `versionGcItems` field, widen the CLI early-return to consider both categories, and render the new category in log and json preview output. Also consolidate the duplicated `parseDuration` helper into `src/domain/data/duration.ts`, shared by the lifecycle service and the unified data repository. Closes #108 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
CLI UX Review
Blocking
None.
Suggestions
-
Inconsistent capitalization in log-mode preview (
src/presentation/renderers/data_gc.tslines 104, 111):
The first log line usesGC preview:(uppercase GC) while the new version GC line usesversion gc:(lowercase). Suggest eitherVersion GC:or keeping the style uniform. Small but users scanning terminal output will notice the mismatch. -
"GC preview: 0 expired data items" when only version GC work exists (
src/presentation/renderers/data_gc.tsline 104):
When a repo has zero lifetime-expired entries but does have excess versions, the log output is:GC preview: 0 expired data items version gc: 2 models with 12 excess versionsThe first line reads oddly when its count is zero. Consider guarding it with
if (preview.items.length > 0), or combining both into a single summary line. -
Log-mode completion message omits version deletion count (
src/presentation/renderers/data_gc.tsline 37):
GC complete: deleted N itemsdoes not surfaceversionsDeleted, so after--forceor the interactive confirm path in log mode, users see no confirmation of how many versions were pruned. This gap is pre-existing, but the fix makes it more visible since version GC is now a first-class operation. A follow-up likeGC complete: deleted N data items, M excess versionswould close the loop. -
JSON branch of
renderDataGcPreviewis unreachable (src/presentation/renderers/data_gc.tslines 86–101):
Phase 1 (preview) is guarded bycliCtx.outputMode === "log", so the JSON branch ofrenderDataGcPreviewis never called. The new JSON fields (versionGcModelCount,versionGcVersionCount,versionGcData) are correct and useful, but a script author runningswamp data gc --jsonwill only see the Phase 2completedshape, not the preview shape. This is also pre-existing architecture, worth noting for a future ticket.
Verdict
PASS — the fix is correct and the user-visible changes are accurate: the early-return now covers both GC categories, the preview surfaces version GC work in both modes, and --dry-run now faithfully reports what would be pruned. No blocking issues.
There was a problem hiding this comment.
Code Review
Clean, well-structured fix for a real bug (version-count GC silently skipped in interactive + dry-run paths). The root cause analysis is accurate — the if (!dryRun) guard around Phase 2 and the early-return on empty preview.items were both problematic. The chosen approach of pushing dry-run support into the repository layer is sound: one source of truth for what gets removed.
Blocking Issues
None.
Suggestions
-
Double
findAllGlobalduring interactive preview: In the interactive path (no--force, no--dry-run),dataGcPreviewnow calls bothfindExpiredData()andpreviewVersionGarbage(), each of which independently callsfindAllGlobal(). Then the actualdeleteExpiredData()call does a third scan. This is a reasonable correctness-over-performance trade-off for a GC operation, but if repos grow large, a future optimization could share the scan across preview methods (similar to howdeleteExpiredDataalready reuses its singlefindAllGlobalresult for both phases). Not blocking — just a note for if performance becomes an issue. -
totalVersionscomputed twice in renderer: InrenderDataGcPreview(src/presentation/renderers/data_gc.ts:87and:106), thereduceoverversionGcItemsis duplicated across thejsonandlogbranches. Minor, and since the branches are independent, this is perfectly fine as-is.
DDD-wise, the changes respect the existing layering:
parseDataDurationproperly extracted to a domain value object operation insrc/domain/data/duration.ts, re-exported throughmod.tsVersionGcPreviewInfois a clean DTO on the domain service interface- Repository dry-run semantics stay at the infrastructure layer
- CLI/renderers import from
libswamp/mod.ts— import boundary respected - New types exported through
src/libswamp/mod.ts
Test coverage is thorough: unit tests for the extracted duration parser, service-level tests for dry-run pass-through and previewVersionGarbage, libswamp-level tests for preview aggregation, renderer tests, and an integration test for the repository dry-run path.
There was a problem hiding this comment.
Adversarial Review
Critical / High
None found. The core logic is correct: the dryRun flag is properly threaded through collectGarbage, the !dryRun guard correctly protects both Deno.remove and the post-deletion updateLatestMarker/catalog update, and the options parameter is optional with a safe ?? false default so all existing callers (including the () => stubs in model/driver/validation tests) continue to satisfy the interface.
Medium
-
Preview can double-count when lifetime-expired data also has version GC excess —
src/libswamp/data/gc.ts:124.dataGcPreviewrunsfindExpiredData()andpreviewVersionGarbage()independently viaPromise.all. A data item with e.g.lifetime: "1d"(expired) andgc: 3with 10 versions would appear in both: as an expired item (all 10 versions) AND as having 7 excess versions. The JSON preview would reportexpiredDataCount: 1plusversionGcVersionCount: 7, overstating the total work. In practice this is mitigated: (a) log mode only showsdataEntriesExpiredin the final result, notversionsDeleted, so the discrepancy is invisible; (b)--dry-runviadeleteExpiredData({dryRun: true})skips Phase 1 deletion, so Phase 2 correctly sees and counts the data. Still, the interactive JSON preview can mislead in this edge case. -
Triple
findAllGlobal()scan in interactive path —src/domain/data/data_lifecycle_service.ts:247and:285. In the interactive path: (1)findExpiredData()→findAllGlobal(), (2)previewVersionGarbage()→findAllGlobal()+collectGarbage(dry)per unique model, (3)deleteExpiredData()→findAllGlobal()+collectGarbageper unique model. For repos with many models, this triples the I/O. Not a correctness issue but worth noting for repos with thousands of model instances.
Low
-
parseDataDuration("0d")returns 0 —src/domain/data/duration.ts:29. The regex^\d+(mo|y|h|m|d|w)$matches"0d", yielding 0 milliseconds. This is safe in practice becauseGarbageCollectionSchemarejects zero-value strings via its.refine()check, and lifetime values normalize zero durations to"workflow". Only a direct call bypassing schema validation would hit this. Pre-existing behavior (moved, not introduced by this PR). -
Promise.allindataGcPreviewloses partial results on rejection —src/libswamp/data/gc.ts:124. IfpreviewVersionGarbage()throws (e.g.findAllGlobal()fails), the already-resolvedfindExpiredData()result is lost. In practice both callfindAllGlobal()so if one fails the other likely does too, and individual model errors withinpreviewVersionGarbageare caught. Negligible risk.
Verdict
PASS — The fix is correct and well-tested. The core bug (interactive/dry-run paths skipping Phase 2 version GC) is properly resolved. The dryRun plumbing through the repository is clean and the interface change is backwards-compatible. The parseDuration consolidation is a faithful extraction with no behavioral change. Test coverage is thorough across unit, integration, and presentation layers. The medium-severity items are informational inaccuracies in edge-case previews, not execution correctness issues.
Summary
Fixes #108 — interactive
swamp data gcsilently skipped version-count GC when no lifetime-expired data existed, so resources withlifetime: infinite, gc: Naccumulated unbounded versions.--dry-rununder-reported the same work. Only--forceand--jsonexecuted correctly.Root cause.
dataGcPreviewsurfaced only lifetime-expired entries viafindExpiredData, and the CLI early-returned on an empty preview. Phase 2 version GC (collectGarbage) lived insidedeleteExpiredDataand was skipped entirely. Separately, the Phase 2 loop was gated onif (!dryRun), so--dry-runnever reported version GC counts.Fix. Extend
UnifiedDataRepository.collectGarbagewith an optional{ dryRun }option that computes would-be counts without deleting. Expose it via a newDataLifecycleService.previewVersionGarbagemethod and aggregate it intoDataGcPreview.versionGcItems. Widen the CLI early-return to check both categories. Remove theif (!dryRun)guard indeleteExpiredData, passingdryRunthrough tocollectGarbageso--dry-runreports faithfully.Cleanup. Consolidate two identical
parseDurationcopies intosrc/domain/data/duration.ts. (The grammar exists in four other places; deliberately out of scope for this PR.)Design
DataGcPreviewgrows additively (itemsfield preserved) so downstream consumers keep working.collectGarbage's options arg is optional, so all() =>stubs in model/driver/validation tests continue to satisfy the interface without changes.Test Plan
deno check— cleandeno lint— clean (1033 files)deno fmt --check— clean (1047 files)deno run test— 4345 passed, 0 failedduration_test.ts;previewVersionGarbage+ dry-run pass-through indata_lifecycle_service_test.ts; aggregation ingc_test.ts; renderer output in newdata_gc_test.tscollectGarbagedry-run indata_versioning_test.ts/tmp/swamp-repro-issue-108(12 versions at gc=10):--dry-run --json: reportsversionsDeleted: 12(was 0 before fix), no deletion--force: prunes correctly, no regressionFollow-ups
systeminit/swamp-uatfortests/cli/data/gc_test.ts(interactive +--dry-run) plus an adversarial test for large version counts (>10k versions on a single model).🤖 Generated with Claude Code