Skip to content

feat(backup): gzip-compressed backups (~5x smaller)#8

Merged
InkyQuill merged 1 commit into
mainfrom
feat/gzip-backups
Apr 19, 2026
Merged

feat(backup): gzip-compressed backups (~5x smaller)#8
InkyQuill merged 1 commit into
mainfrom
feat/gzip-backups

Conversation

@InkyQuill

Copy link
Copy Markdown
Owner

Summary

~/.sedx/backups/<id>/filename.ext is now ~/.sedx/backups/<id>/filename.ext.gz. Streaming gzip via flate2::GzEncoder keeps memory flat on large files.

Real-world measurement (2000-line Rust source, 86,210 bytes):

  • Uncompressed backup: 86,210 bytes
  • Gzipped backup: 15,617 bytes (5.5× reduction)

Logs / structured text will compress even harder; tiny configs will see more modest ratios (still a win — overhead of the gzip header is ~20 bytes).

Compatibility

restore_backup auto-detects legacy uncompressed backups by checking for the .gz suffix on FileBackup::backup_path — pre-v1.1 backup directories remain fully restorable after upgrade. New regression test: restore_accepts_legacy_uncompressed_backup.

Why gzip (and not diff)

Full-file gzip keeps the same "always-restorable" guarantee the current uncompressed backups have. Diff-based backups would save more space but trade reliability (manual edits between backup and rollback can invalidate the patch context). sedx's whole pitch is safety-by-default, so this PR takes the conservative-savings option.

Test plan

  • All 292 lib tests + 66 integration tests green
  • cargo clippy --all-targets -- -D warnings clean on both Linux and Windows cross-target
  • cargo fmt --check clean
  • End-to-end smoke test: create backup → edit file → rollback → content matches original (86KB → 15.6KB backup)
  • Legacy-uncompressed-backup restore works (unit test covers it)

Follow-ups not in scope

  • Content-addressable dedup across backups (same file backed up twice → stored once) would stack on top of this; considered separately.
  • Compression level is Compression::default() (level 6). Could expose a config knob if users want Compression::fast() or Compression::best(), but no evidence it's needed yet.

Backup files in ~/.sedx/backups/<id>/ are now gzipped with a .gz suffix.
On a real-world Rust source sample (86 KB) the backup shrinks to 15.6 KB
— a 5.5x reduction. Savings scale with file compressibility; on logs
and structured text the ratio is typically even better.

Streaming I/O through flate2's GzEncoder keeps peak memory flat for
large backups (same story as the main streaming file processor).

Backwards compatibility: restore_backup auto-detects legacy
uncompressed backups by checking for the .gz suffix on the stored
backup_path, so pre-v1.1 backups remain restorable. Covered by the new
restore_accepts_legacy_uncompressed_backup unit test.

Other backup-related tests updated to assert the .gz filename and to
verify content round-trips through decompression rather than reading
the backup as raw text.
@InkyQuill InkyQuill merged commit 51d3297 into main Apr 19, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant