switch ssv-node storage to Pebble DB#2709
Conversation
|
@greptileai please review this PR |
Greptile SummaryThis PR switches SSV node storage from BadgerDB to PebbleDB for both the node and exporter, introducing a new The one remaining finding worth noting is a minor resource leak in the truncation-recovery path of Confidence Score: 5/5Safe to merge — all previously flagged P0/P1 issues are resolved; the single remaining finding is a minor resource-leak edge case in the truncation-recovery error path. The code has gone through multiple review rounds and the critical issues (context propagation, Badger truncation recovery, temp-file atomicity, test cleanup, double-close) are all resolved. The one new finding (recoveredDB not closed on Close() failure) requires a disk-IO error during a crash-recovery code path to trigger and is self-healing on process restart. All remaining findings are P2 quality-of-life items that don't block correctness. storage/migration/db_migration.go — openBadgerReadOnly truncation-recovery close path (line 553) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Node Startup] --> B[ResolvePebbleDBPlan]
B --> C{probePebbleLayout}
C --> D[DirState canonical path]
C --> E[DirState legacy path]
D & E --> F{hasBadgerFiles basePath}
F --> G{Single non-empty Pebble?}
G -- yes --> H[resolveSingleNonEmptyPebblePathPlan]
H --> I{badgerImportMarkerState}
I -- done marker exists --> J[Return plan, no BadgerImportPath]
I -- inProgress or badger files --> K[Return plan with BadgerImportPath]
G -- no --> L{Both Pebble paths empty}
L --> M[resolveEmptyLayoutPlan]
M --> N[Return plan with BadgerImportPath]
J --> O[pebble.New open DB]
K --> O
N --> O
O --> P{BadgerImportPath set?}
P -- no --> Q[applyMigrations]
P -- yes --> R[MigrateBadgerToPebbleIfNeeded]
R --> S{doneMarkerExists?}
S -- yes --> T[Verify pebble non-empty, return]
S -- no --> U[badgerDirState open Badger once]
U --> V{Badger non-empty?}
V -- no --> W[Skip migration]
V -- yes --> X[runBadgerImport]
X --> Y[Write inProgress marker]
Y --> Z[copyBadgerToPebble batched with ctx check]
Z --> AA[Write done marker]
AA --> BB[Remove inProgress marker]
BB --> Q
T --> Q
W --> Q
Q --> CC[DB ready]
Reviews (23): Last reviewed commit: "simplify migration code" | Re-trigger Greptile |
Codecov Report❌ Patch coverage is ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@greptileai please review it again |
|
@greptileai review it again |
Additional Comments (2)
If
If |
|
@greptileai review this again, considering all comments on this PR |
|
@greptileai review this again, considering all comments on this PR |
|
@greptileai review this again, considering all comments on this PR |
|
@greptileai review this again, considering all comments on this PR |
|
@greptileai review this again, considering all comments on this PR |
Staging Test Report (operators 93-96, BadgerDB → PebbleDB migration)Branch: Migration: PASSAll 4 nodes migrated from BadgerDB to PebbleDB on startup:
All 9 schema migrations preserved. Logs confirm Memory: 55% REDUCTION
Goroutines: ~100 fewer per node
Likely from eliminated BadgerDB background GC goroutines. Functional Correctness: PASS
Errors: PASS (all pre-existing)
P2P Peers: ConvergingPebbleDB nodes at 3-9 peers (just restarted), baseline at 10-11. Expected to converge. Note on pathNodes log a WARN: Next Steps
Tested via SSV Scout |
olegshmuelov
left a comment
There was a problem hiding this comment.
LGTM!
One thing for release notes @y0sher: the migration requires roughly 2x the current database size in free disk space (both Badger and Pebble coexist during and after migration). Operators with large DBs should be warned before upgrading. The code handles ENOSPC gracefully (clear error, resumable), but better to warn proactively than have operators discover it at runtime.
There was a problem hiding this comment.
Looks quite comprehensive, left some minor suggestions.
Although I'm struggling a bit to get the full picture, it maybe feels like the intermediary PebbleDBPlan concept complicates things more than helps (or maybe it just could use some restructuring) ... I get the idea to separate the "parsing of the current state/plan" (represented by PebbleDBPlan) from the "data-migration execution" ... it's just a bit hard to map it onto a simple list of scenarios I would expect us to have to implement such a migration, which would be something like:
- pebble exists, badger does not -> just use pebble
- pebble exists, badger exists -> resolve which one is the source of truth and finish the migration if needed
- pebble doesn't exist, badger exists -> run the migration
- ...
maybe the code in migrateBadgerToPebbleIfNeeded func could be restructured a bit & supplied with step-by-step comments to clarify where/when each of those cases (from my list above) takes place - currently it kinda mixed a lot of things together making it hard to see what's going on
* pebble-db: define temporary DB * remane
|
@greptileai please review the latest changes |
# Conflicts: # cli/operator/node_test.go # identity/store_test.go
# Conflicts: # network/p2p/testutils.go
# Conflicts: # ssvsigner/e2e/testenv/environment.go # ssvsigner/e2e/testenv/key_managers.go # ssvsigner/ekm/local_key_manager_test.go # ssvsigner/ekm/signer_storage_test.go # ssvsigner/ekm/slashing_protector_test.go # ssvsigner/go.mod # ssvsigner/go.sum
iurii-ssv
left a comment
There was a problem hiding this comment.
Looks much cleaner! Left some minor suggestions.
Also, we should probably create a task to "remove the Badger directory once the done marker has been present for, say, 24 hours" ?
| func ResolveDBLayout(basePath string) (DBLayout, error) { | ||
| layout, err := probePebbleLayout(basePath) | ||
| if err != nil { | ||
| return DBLayout{}, err | ||
| } | ||
| if err := validateEmptyPebbleDoneMarkers(basePath, layout); err != nil { | ||
| return DBLayout{}, err | ||
| } | ||
| badgerFilesPresent, err := hasBadgerFiles(basePath) | ||
| if err != nil { | ||
| return DBLayout{}, fmt.Errorf("check badger path %q: %w", basePath, err) | ||
| } |
There was a problem hiding this comment.
Badger opened on every post-migration startup. Even after the done marker is written, badgerDirState still stats the legacy directory and (depending on branch) opens it read-only. For already-migrated nodes this is pure IO waste — the done marker alone should short-circuit the layout resolution. Consider returning early in ResolveDBLayout the moment the done marker is observed.
There was a problem hiding this comment.
@iurii-ssv I'm not sure I understood the part about badgerDirState, can you please clarify it?
| } | ||
|
|
||
| if layout.BadgerImportPath != "" { | ||
| migrated, migratedKeys, err := storagemigration.MigrateBadgerToPebbleIfNeeded( |
There was a problem hiding this comment.
During import, both Badger and Pebble directories coexist. This is mentioned in the PR description but not in user-facing startup logs; operators on tight disks will discover this at the worst moment. Suggest a preflight statfs check at copyBadgerToPebble entry with a clear error if free space < badger size × 1.2.
Probably a good idea to add a check like this.
There was a problem hiding this comment.
@iurii-ssv I had this check, but I decided to remove it because it's not reliable and the code is already complicated. Anyway, if migration fails due to a lack of space, it should interrupt the migration
There was a problem hiding this comment.
Why do you think it's not reliable ?
I think it makes sense to check the available space not only to make sure the migration succeeds, but also to ensure the SSV node is able to keep functioning after the migration too (e.g if we have 20 GB of space on disk with Badger occupying 10GB, and newly created Pebble occupying 9.5GB after migration is done ... we have 0.5GB left to work with - which might result in a failure shortly after, even before the Operator can spot & resolve this manually), right ?
I don't think we need to do the "space check" on every migration as a habit (although we maybe could add this as a generic sanity-check on startup? So we fail to start the node cleanly, instead of letting it run out of disk-space), but for this one specifically seems like it might prevent some pain for Operators.
There was a problem hiding this comment.
I don't think it's reliable because we don't know how much space the Pebble DB will take. If we check for 1.2x of Badger, but Pebble uses 1.5x, then the check will be useless. If we check for 1.2x but Pebble uses 0.5x, the check could prevent migration even with enough space. Even if the size is roughly the same, but a disk has 15% extra space, not 20%, it might be fine, because we don't need Pebble DB after successful migration.
There was a problem hiding this comment.
We could probably estimate the relative size (Badger vs Pebble) by running this migration once on some operator from our stage env (so we'll know what *.*x value is reasonable to use).
I mean, I agree that it's a heuristic/judgement that might result in false-positives ... I just think it's worthwhile even so.
| tmpPath := path + ".tmp" | ||
| // #nosec G304 -- marker path is derived from configured DB path plus fixed marker filenames. | ||
| tempFile, err := os.OpenFile(tmpPath, os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0o600) | ||
| if err != nil { | ||
| return fmt.Errorf("open marker temp file: %w", err) | ||
| } |
There was a problem hiding this comment.
Nit: writeMarker does not os.Remove the temp file if fsync or rename fails partway, leaving a stray .tmp next to the marker. defer os.Remove(tmp) after the initial create, cleared on success, would be robust.
There was a problem hiding this comment.
What paths do you mean where it's not removed?
| if !inProgressMarkerExists { | ||
| if err := writeBadgerImportInProgressMarker(pebblePath, badgerPath); err != nil { | ||
| return false, 0, wrapNoSpaceImportError(err, badgerPath, pebblePath) | ||
| } | ||
| } |
There was a problem hiding this comment.
I'm not sure I understand what the purpose of InProgressMarker is:
- if it is it to prevent concurrent migrations from running (multiple SSV nodes accessing the same DB) ... it doesn't seem like it works for that
- if it is to "continue from where we left off", it doesn't track where we stopped (and it's probably an overkill for this anyway)
so it kinda seems like we could just use the DoneMarker alone and get the same results ?
There was a problem hiding this comment.
I'm not sure DoneMarker alone would be enough here. Historically we could already have both DBs populated without any migration marker at all, because Pebble was used under cfg.DBOptions.Path + "-pebble" while Badger stayed at cfg.DBOptions.Path. InProgressMarker is what lets us distinguish that kind of overlap from an actual interrupted migration that is safe to resume.
momosh-ssv
left a comment
There was a problem hiding this comment.
One more that didn't fit inline (DropPrefix sits outside the diff hunks):
storage/pebble/pebble.go DropPrefix — could this lead to keys persisting after the call returns? Seems that the Badger version ran inside a transaction, while the Pebble version here uses a non-indexed batch plus a live-DB iterator, so writes that race with the loop can persist after DropPrefix returns. Worth considering db.DeleteRange(prefix, upperBound, pebble.Sync) — single atomic operation, and faster too.
|
|
||
| if err := applyMigrations(logger, beaconConfig, operatorPrivKey, db, dbPath); err != nil { | ||
| return nil, fmt.Errorf("apply migrations: %w", err) | ||
| if err := applyMigrations(logger, beaconConfig, operatorPrivKey, db, layout.PebblePath); err != nil { |
There was a problem hiding this comment.
What do you think about deferring the done-marker write until after applyMigrations succeeds? I fear that if applyMigrations crashes mid-run on a fresh import, the next start sees the done marker present plus a half-transformed Pebble. The resolver then trusts Pebble as the source of truth, and recovery depends on each schema migration being individually idempotent — which is a per-author guarantee rather than a structural one. Either move the done-marker write after applyMigrations, or add a separate schema-applied sentinel and refuse to start when the done marker exists but the schema sentinel doesn't.
| return marker.BadgerDirPath == "" || samePath(marker.BadgerDirPath, badgerPath), nil | ||
| } | ||
|
|
||
| func samePath(left string, right string) bool { |
There was a problem hiding this comment.
Could this lead to spurious re-imports on operator setups that bind-mount or symlink the data dir? Seems that samePath only does Clean+Abs but not symlink resolution, so e.g. /var/lib/ssv -> /mnt/data/ssv will produce a different absolute path on subsequent runs, markerExistsForSource returns false, and the migration tries to re-import on top of an already-imported Pebble. Maybe worth using filepath.EvalSymlinks with a fallback to Clean+Abs when EvalSymlinks fails (e.g. path doesn't exist yet).
| dbPath := cfg.DBOptions.Path + "-pebble" // opinionated approach to avoid corrupting old db location | ||
|
|
||
| db, err := pebble.New(logger, dbPath, &cockroachdb.Options{}) | ||
| db, err := pebble.New(logger, layout.PebblePath, &cockroachdb.Options{}) |
There was a problem hiding this comment.
Any reason we're not wiring zap into Pebble's Options here? I fear this is an observability regression vs Badger — Pebble's WriteStallBegin/End, DiskSlow, compaction stalls and manifest recovery events would otherwise go nowhere, and migrating the operator-facing data layer makes those signals more important, not less. Could we add a pebble.Logger adapter over zap on Options.Logger, an EventListener for stall/slow events, and expose db.Metrics() via the existing metrics handler?
The PR applies Pebble DB usage for both ssv-node and exporter instead of Badger DB for ssv-node and Pebble DB for exporter.