feat(export): add --canonical flag for byte-reproducible snapshots#1015
feat(export): add --canonical flag for byte-reproducible snapshots#1015mmahut wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (7)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughThis PR introduces a ChangesCanonical Deterministic Export
Sequence Diagram(s)sequenceDiagram
participant User
participant ExportCmd as dolos data export
participant RebuildFn as rebuild_stores_canonical
participant ArchiveStore
participant StateStore as Fjall StateStore
participant IndexStore as Fjall IndexStore
participant Tar as GzTar Builder
User->>ExportCmd: --canonical flag set
ExportCmd->>RebuildFn: rebuild requested stores
RebuildFn->>ArchiveStore: rebuild_index_to(temp_file)
RebuildFn->>StateStore: rebuild_canonical(temp_dir)
RebuildFn->>IndexStore: rebuild_canonical(temp_dir)
RebuildFn-->>ExportCmd: CanonicalPaths (temp locations)
ExportCmd->>ExportCmd: shutdown live stores
ExportCmd->>Tar: append_archive_with_canonical_index<br/>(archive/ with canonical index substituted)
ExportCmd->>Tar: append state/ from canonical temp path
ExportCmd->>Tar: append index/ from canonical temp path
ExportCmd->>Tar: finish()
ExportCmd->>RebuildFn: drop(CanonicalPaths) → delete temp paths
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/cardano/src/rupd/work_unit.rs (1)
247-265: ⚡ Quick winAdd a byte-stability regression test for the new ordering paths.
This sort, together with the
BTreeSetswitch incrates/cardano/src/model/epochs.rs, is now carrying the reproducibility guarantee, but the provided tests still exercise roundtrips/default-empty collections more than byte equality. A focused test that builds the same logical rewards/pool set in different insertion orders and asserts identical encoded bytes would protect this feature from a quiet regression.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/cardano/src/rupd/work_unit.rs` around lines 247 - 265, Add a regression test that verifies byte-stable CBOR encoding of reward structures when the same logical rewards and pool data are built in different insertion orders. The test should create two Reward instances (using both Reward::MultiPool and Reward::PreAllegra variants) with identical pool/value pairs but inserted in different orders, then verify that after the sort_unstable_by_key operations on as_leader and as_delegator, the CBOR-encoded bytes are identical. This test protects the reproducibility guarantee provided by the sorting logic from silent regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/bin/dolos/data/export.rs`:
- Around line 228-255: The code currently allows the --canonical flag to work
with Fjall backends for both state and index stores, but Fjall produces
non-byte-identical outputs across runs which violates the reproducibility
contract. Replace the rebuild_canonical() calls in the StateStoreBackend::Fjall
and IndexStoreBackend::Fjall match arms with bail!() calls that reject the
unsupported combination, similar to how StateStoreBackend::Redb already rejects
--canonical with an appropriate error message. This ensures --canonical only
works with backends that guarantee byte-identical output.
- Around line 352-357: The NoOp archive backend error in the match statement is
currently unconditional and prevents all exports when using this backend, even
if the user only requested state or indexes without archive data. Gate this
error check on the `include_archive` flag so it only bails when archive data is
actually being exported, matching the behavior of the canonical mode path. Only
throw the bail error if the user explicitly requested archive export alongside
the NoOp backend.
---
Nitpick comments:
In `@crates/cardano/src/rupd/work_unit.rs`:
- Around line 247-265: Add a regression test that verifies byte-stable CBOR
encoding of reward structures when the same logical rewards and pool data are
built in different insertion orders. The test should create two Reward instances
(using both Reward::MultiPool and Reward::PreAllegra variants) with identical
pool/value pairs but inserted in different orders, then verify that after the
sort_unstable_by_key operations on as_leader and as_delegator, the CBOR-encoded
bytes are identical. This test protects the reproducibility guarantee provided
by the sorting logic from silent regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ac361e07-0a81-4378-bceb-2388e6bc3a4b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
crates/cardano/src/model/epochs.rscrates/cardano/src/rupd/work_unit.rscrates/fjall/src/index/mod.rscrates/fjall/src/lib.rscrates/fjall/src/state/mod.rscrates/redb3/src/archive/mod.rssrc/bin/dolos/data/export.rs
b60045d to
52969eb
Compare
This is an attempt to make canonical snapshots reproducible as that two independent syncs to the same epoch should produce identical exports.
--canonicalflag: redb layout depends on allocation history, fjall SSTables embed per-run sequence numbers. The flag rebuilds each store into a fresh database before exporting — sorted insertion for redb, single-batch bulk write + major compaction for fjall. Live stores are not touched.Summary by CodeRabbit
Release Notes
New Features
--canonicalflag to the export command to produce byte-reproducible archives by rebuilding indexes into canonical temp copies before archiving.Improvements
Bug Fixes