Skip to content

Unified ingestion workflow: streaming ingestion daemon#795

Draft
chowbao wants to merge 32 commits into
feature/full-historyfrom
streaming-ingestion-daemon
Draft

Unified ingestion workflow: streaming ingestion daemon#795
chowbao wants to merge 32 commits into
feature/full-historyfrom
streaming-ingestion-daemon

Conversation

@chowbao

@chowbao chowbao commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Implements the unified ingestion workflow streaming daemon — the orchestration layer from design #722 (design-docs/full-history-streaming-workflow.md + gettransaction-full-history-design.md), part of the #777 RPC v2 roadmap. Base: feature/full-history.

What this adds

A daemon under cmd/stellar-rpc/internal/fullhistory/streaming/ that orchestrates the existing merged stores (issues 1–20 of the implementation breakdown):

  • Catalog (meta-store) key schema + the one-write protocol (mark→fsync file+dirent→flip), with a strict key↔path bijection and freezing|frozen|pruning / transient|ready states.
  • Catch-up on startup, live ingestion from captive core (indexed GetLedger, one atomic synced WriteBatch per ledger across all CFs), and the freeze → rebuild → discard → prune lifecycle tick.
  • Per-chunk hot DB as a single multi-CF RocksDB; derived progress (resume point recomputed from durable state at startup, never stored).
  • Postcondition resolver + executor (pure catalog diff → bounded worker pool, success-signalling done-channels); the cold tx-hash rolling-rebuild + coverage protocol (layers coverage keys + commit batch on TxHash cold store — streamhash index (build + read) #728's single-index build).
  • Surgical recovery (atomic catalog key-demotion, no filesystem surgery) + hot-volume-loss detection; the audit command (INV-1…4, incl. optional deep byte-compare); Prometheus metrics + structured logging.

Composes (separately tracked — not reimplemented)

#765 (ingest primitives), #728/#729 (tx-hash cold/hot), #764 (XDR extractors), #695/#739/#740/#756 (ledger/event stores), pkg/{chunk,rocksdb,stores/metastore}, internal/packfile.

Scope boundaries / intentionally deferred

Tests

  • Full fullhistory tree green on the non-short suite (RocksDB cgo). Crash-injection / convergence suite (every injected state converges to INV-1∧2∧3∧4 via audit), in-process E2E (first-start / freeze / prune / restart-resume re-derivation / multi-window tx-hash lookup + cross-window false-positive rejection), and .bin/.idx byte-format-identity tests against the merged cold path.
  • Reconciled to the current design revision (c586667a) and reviewed across concurrency / test-intent / design-faithfulness lenses: no blockers.

Reviewer notes

  • The full cmd binary requires the pre-existing make build-libpreflight (rust FFI) to link; the Go code all compiles.
  • Built against RocksDB 10.9.1 (grocksdb 1.10.7).
  • History is per-issue/phase (28 commits) for reviewability rather than squashed.

chowbao added 30 commits June 18, 2026 01:54
…+ sweeps

Crash-safety tests now fault-inject inside the real catalog methods rather
than hand-replaying their steps, so the load-bearing operation ORDER is what
is verified:

- crashHooks (hooks.go): nil-in-production fault-injection points fired from
  inside SweepChunkArtifacts, SweepIndexKey, and CommitIndex.
- EXIT invariant (key absent => file gone): beforeKeyDelete fires after the
  durable unlink and before the key delete; the test asserts file-gone +
  key-present there. Reordering the key delete ahead of the unlink turns the
  test red.
- Never-unlink-under-a-frozen-key: beforeUnlink fires after the frozen->pruning
  demote and before the unlink; the test asserts the value is pruning (not
  frozen) and the file is still present. Dropping the demote turns it red.
  Asserted for both the per-chunk and index sweeps.
- CommitIndex atomicity: failCommitBatch forces the batch callback to error so
  metastore drops the whole batch; the test asserts none of promote/demote/
  terminal-txhash writes are observable. Rewriting the commit as separate Puts
  turns it red.
- CommitIndex re-commit idempotency: exercises the prev.Key == cov.Key branch;
  a second commit on the same coverage leaves exactly one frozen coverage and
  demotes nothing against itself.
Implement the Phase B catch-up primitives that materialize cold
artifacts for one chunk:

- processChunk: per-kind one-write protocol (skip frozen kinds; mark
  freezing -> drive the merged cold ingestion -> fsync file+dirents ->
  flip frozen), reusing the cold ingester set via the new
  ingest.RunColdChunk (single chunk, explicit per-kind output roots so
  the streaming layout's txhash/raw path is honored without re-deriving
  any extractor or writer).
- catchupSource: rule 2 source-preference order — complete hot tier
  (DECISION (b): MIN across the three independent per-chunk hot stores'
  last-committed seq >= chunkLastLedger) -> frozen local .pack when lfs
  is not requested -> bulk backend behind a bounded coverage wait. Hot
  loss (ready key, missing/unopenable dir) is the case-4 fatal
  (ErrHotVolumeLost); incomplete-but-present is staleness and falls
  through.
- ArtifactSet kind subset; rocksHotProbe production probe + a
  hot-ledger-backed ChunkSource; pollingBackendWaiter.

Tests (real RocksDB + temp dirs; fake ChunkSource/HotProbe/waiter):
three artifacts produced and keys flipped frozen, idempotent skip,
re-materialization after a freezing crash, the min-of-three gate, and
the loss-vs-staleness split.
…F DB; atomic WriteBatch/ledger (decision a)

The hot tier was three independent per-chunk RocksDB stores (ledger,
events, txhash) committed concurrently via an errgroup fan-out, so
"complete" meant a min-of-three reconciliation and a ledger could be
partially present across stores. Decision (a) makes the hot tier ONE
per-chunk RocksDB instance whose ledgers / events / txhash data are
column families of a single store, with each ledger committed as ONE
atomic synced WriteBatch across ALL CFs — a ledger is fully present or
fully absent, and there is a single authoritative watermark (the
ledgers CF's last key).

- New pkg/stores/hotchunk: opens one rocksdb.Store with the union of
  CFs (ledgers + events' 3 + txhash's 16, non-colliding names), composes
  the three typed facades over it, and exposes IngestLedger (one atomic
  synced batch per ledger) plus MaxCommittedSeq (single watermark).
- ledger/txhash/eventstore HotStores gain NewWithStore (wrap a shared
  store, no DB ownership) + per-ledger batch-append helpers; their
  standalone openers and read APIs are unchanged. Ledger data moves from
  the default CF to a named "ledgers" CF. eventstore splits its ingest
  into prepare -> queue -> commit -> apply so the mirror/offsets update
  runs only after the shared batch is durable.
- ingest HotService/RunHot/HotStores drive the shared DB: one atomic
  write per ledger, no errgroup fan-out, no concurrent per-store commits.
- streaming hotsource/process/execute open the one shared DB; the
  completeness gate is the single maxCommittedSeq >= chunkLastLedger
  (min-of-three removed).
- Tests: hotchunk atomicity (a rejected/mid-batch-failed ledger persists
  nothing across any CF; the watermark advances only on full commit),
  watermark authority, read-behavior preservation; adapted the ledger/
  events/txhash hot-store, ingest, and streaming tests to the shared-DB
  model. Whole fullhistory tree builds, vets, and tests green (incl.
  -race on stores + ingest).
…CompleteThrough)

Implement the two progress derivations on the committed Phase A+B substrate,
under decision (a) (single hot-tier MaxCommittedSeq):

- deriveCompleteThrough: chunk-granularity bound for the lifecycle tick. Maxes a
  cold term (highestDurableChunk: lfs+events frozen AND txhash frozen-or-index-
  covered), a positional term (count-only-ready: max ready hot chunk - 1), and
  the earliest-ledger floor.
- deriveWatermark: deriveCompleteThrough + one MaxCommittedSeq refinement read of
  the highest ready hot DB (sub-chunk precision + boundary-crash recovery), with
  a per-ready-key dir-existence fatal loop (ErrHotVolumeLost / case 4).

Sentinel-underflow guard: all 'highest complete chunk' arithmetic runs in int64
with a -1 pre-genesis sentinel; completeThrough maps negatives to FirstLedgerSeq-1
and never feeds an underflowed uint32 into chunk.ID.LastLedger(). Fresh store,
live-chunk-0, and absent/genesis earliest pin all derive FirstLedgerSeq-1 instead
of MaxUint32.

Table-driven tests cover every term, the count-only-ready exclusion of transient
keys, the boundary-crash refinement, count-only-ready empty-DB fallback, the
fatal-on-missing-dir loop over every ready key, and the underflow guards (one
test fails on a naive uint32 completeThrough). Tighten the stale process.go
comment now that deriveWatermark exists.
Add the DECISION (a) hot-DB ingestion loop and its hot:chunk bracket
helpers under internal/fullhistory/streaming:

- openHotTierForChunk: open/recover/create the ONE shared per-chunk
  multi-CF hot DB under the Phase A hot:chunk bracket (transient ->
  create + fsync dir + parent -> ready). A "ready" key whose dir is
  missing is case-4 hot-volume loss (ErrHotVolumeLost), never healed.
- discardHotTierForChunk: retire the bracket (transient -> rmdir +
  fsync parent -> delete key); idempotent, crash-safe re-run.
- runIngestionLoop: drive an injected ledgerbackend.LedgerStream into
  the hot DB, committing each ledger as ONE atomic synced WriteBatch
  across all CFs. At a chunk boundary it CLOSES the just-filled DB
  BEFORE creating chunk C+1's key (the handoff fence that makes C
  visibly complete), then rings the payload-free size-1 coalescing
  doorbell. ctx-cancel / shutdown-driven stream close -> nil; an
  unexpected stream close or any ingest failure -> error (supervisor
  restarts). No progress variable: the last synced batch is the
  watermark, re-derived at startup.

Adds a beforeHotTransient crash hook (fired inside PutHotTransient) so
the boundary close-before-create-key order is asserted from inside the
real path, not a hand-replayed sequence.

Tests (real hotchunk DB + a fake LedgerStream): a ledger lands
atomically across all CFs; the boundary closes the DB before creating
C+1's key; the doorbell coalesces without blocking; ctx-cancel returns
nil; an unexpected/errored close returns an error; restart resumes
idempotently from the derived watermark.
Two coverage gaps on the derived-progress code (1e25396):

1. completeThrough sentinel-underflow guard was effectively untested at
   any level but the lone -100 unit case. The production sentinel -1
   ALIASES the guard-less uint32 wrap: chunk.ID(MaxUint32).LastLedger()
   overflows (MaxUint32+1 -> 0) to exactly 1 == preGenesisLedger, so a
   dropped 'c<0' guard leaves every -1-path assertion green. Add a -2
   row (whose guard-less wrap is 4294957297, NOT aliasing) plus a direct
   assertion contrasting guardlessWrap(-1)==preGenesisLedger against
   guardlessWrap(-2)!=preGenesisLedger, documenting the trap so the
   guard is genuinely exercised. Mutation-verified: removing the guard
   now fails the -2 and -100 cases.

2. The single-DB MaxCommittedSeq refinement was only ever read through
   fakeHotProbe (a canned constant ignoring its chunk-id arg), so the
   production rocksHotProbe -> hotchunk.DB -> ledgers-CF last-key
   round-trip was uncovered end-to-end. Add progress_realdb_test.go
   driving the real probe: RefinementIsNotStale (bound rises to the real
   committed frontier), OpensHighestReady (refines the highest ready
   chunk's DB, not ready[0]), EmptyLiveFallsBack (empty live DB -> no
   fabricated frontier). Mutation-verified: a stale/constant
   MaxCommittedSeq fails the first two.
…process locking

Address review findings on the Config work:
- ParseConfig now decodes strictly (go-toml v1 Decoder.Strict(true)) so an
  unknown/typo'd key is rejected instead of silently falling back to a default.
  This matches the LoadConfig docstring and prevents a typo in an immutable,
  layout-defining key (chunks_per_txhash_index, earliest_ledger) from pinning
  the wrong value on first start. Add table-driven TestParseConfig_RejectsUnknownKeys.
- Restart-immutability tests now read both layout pins straight back from the
  live metastore after each first-start and restart call (requirePins helper),
  and assert a successful/aborted restart MUTATES NOTHING. This kills the
  corrupt-re-pin mutation (a restart returning the right value but rewriting a
  wrong pin) and makes any metastore read-visibility anomaly surface loudly as
  a missed-pin failure rather than a downstream nil error.
…on contract

Add the storage-side reader-retention contract as an explicit, tested
gate (RetentionGate / seqWithinRetention): a seq below the effective
retention floor is not-found regardless of on-disk state. This is the
property the prune and sweep stages rely on to unlink unilaterally
without coordinating with the index lifecycle. Wire the discard scan's
past-retention test through the gate's ChunkBelowFloor so the reader and
the lifecycle share one definition of the floor.

Tests cover the four retention scenarios end to end against production
code: widening at the next startup re-derives the wider [lo', last]
coverage (resolve emits the wider terminal IndexBuild + .bin
re-materialization for fully-pruned chunks; CommitIndex demotes the old
coverage), driven both directly and through catch-up's runBackfill;
shortening raises the floor immediately and the prune tick sweeps the
newly-out-of-range chunks (keys + files); a window straddling the floor
serves its in-range tail while below-floor seqs are not-found and its
below-floor chunk artifacts are pruned but its .idx is kept; and the
prune scan's redundant-input branch cleans the frozen-and-freezing
chunk:c:txhash keys a widened-then-narrowed window leaves behind.
…ling

Add the surgical-recovery operation (design Scenario coverage cases 3 and
4): one atomic Catalog key-demotion batch that demotes tainted cold
artifacts (chunk:{c}:* and every overlapping index:* key) to "freezing"
and tainted/lost hot keys (the live chunk's included) to "transient".

- PlanSurgicalRecovery computes the exact in-range / overlapping key set
  from a catalog snapshot, never conjuring absent keys.
- ApplySurgicalRecovery commits all demotions in one synced metastore
  batch; re-running is a no-op (idempotent overwrite to fixed values).
- RunSurgicalRecovery is the operator entrypoint: it takes every
  storage-root flock (failing fast with ErrRootLocked against a running
  daemon -> stopped-daemon-only), reopens the meta store, plans+applies,
  and releases. Carries a runbook comment.
- Self-correcting watermark: demoting hot keys regresses deriveWatermark
  to the last frozen boundary (transient keys are excluded from the
  positional/refinement terms); catch-up + forward re-ingest heal, no
  manual rewind. The case-4 fatal (ready hot key, missing dir) in
  deriveWatermark/openHotTierForChunk is verified, not re-implemented.

Tests (cgo): atomic+idempotent demotion, cold/index/hot scoping, hot-only
(case 4) leaving cold untouched, index-overlap boundaries, watermark
regression to last frozen boundary, watermark-unchanged below the live
chunk, cold re-derivation signal, both case-4 fatal sites, and the
operator entrypoint (lock refusal + happy path).
Implement the design's 'audit admin command' (Correctness, line 1364) as
Catalog.Audit + the read-only RunAudit operator entrypoint. The audit
composes the catalog's key-walking primitives and a filesystem walk against
the layout bijection; it never reaches into the phase scans that MAINTAIN
the invariants, so a bug in any scan surfaces here as a real violation.

- INV-2 (single canonical state): walk meta keys, cross-check the four
  forbidden co-existences (two frozen index keys per window; a freezing/
  pruning artifact key surviving quiescence; an orphan hot key for a
  fully-served chunk; a per-chunk txhash key in a finalized window).
  Excludes exactly the two transients the design tolerates: a 'transient'
  hot key, and a 'freezing' artifact key strictly above completeThrough
  (the hot-volume-loss tail no source can yet repair).
- INV-3 (disk<->meta): walk both directions — orphan files / duplicate
  artifacts / orphan hot dirs (disk->meta) and dangling keys (meta->disk),
  tolerating the mid-sweep 'pruning'-key-no-file window and 'transient'
  hot-key-no-dir bracket.
- INV-4 (retention bound): walk keys vs effectiveRetentionFloor, flagging
  only ranges WHOLLY below the floor (a straddling window is masked by the
  reader retention contract, not pruned).
- INV-1 (read correctness): optional deep mode re-derives sampled frozen
  artifacts via an injected conformant-LedgerBackend DeepDeriver and
  byte-compares; skipped when no deriver is supplied.

Tests (cgo): clean store; each INV-2 forbidden co-existence and both
tolerated transients; INV-3 orphan/duplicate/dangling/orphan-hot-dir plus
the tolerated pruning-no-file; INV-4 below-floor vs straddling; INV-1 deep
match/mismatch/declined/error/no-deriver.
…zen window also has hot/txhash keys

Clauses 3 and 4 of auditSingleCanonicalState routed through
Catalog.FrozenCoverage (via pendingArtifacts->indexCovers and
txhashRedundantInFinalizedWindow), which ERRORS when a window holds two
frozen index keys. So a store with a two-frozen INV-2 breach AND a hot
key or per-chunk txhash key in that window aborted the whole audit with
a non-I/O error: Audit returned (AuditReport{}, err), discarding the
clause-1 violation and any INV-3/INV-4 findings, and the zero-value
report's Clean() reads true. This contradicted Audit's 'error only for
I/O, never for a violation' contract and the 'report every breach' goal,
making the audit least useful on a multiply-corrupted store.

Clauses 3 and 4 now read a duplicate-tolerant frozen-coverage view
(auditPendingArtifacts over the frozenCoverageContains predicate;
auditTerminalCoverage over the per-window frozen map built for clause 1)
— the same all-scan-keep-frozen approach clause 1 and deriveCompleteThrough
already rely on. The two-frozen-keys case stays a recorded clause-1 INV-2
violation and the audit finishes the full INV-2/INV-3/INV-4 walk. The
production sweeps in eligibility.go keep the strict FrozenCoverage path.

Adds a regression test (window with two frozen keys + orphan hot key +
leftover txhash key) asserting err==nil and >=3 INV-2 violations.
…onfig wiring

Add RunDaemon(ctx, configPath): the full-history streaming daemon's process
entrypoint. It loads the TOML config, locks every configured storage root
(single-process flock), opens the meta store + binds the Catalog, runs
validateConfig (pins the immutable layout, resolves the earliest_ledger floor),
builds the production external boundaries, and runs a supervised startStreaming
loop that restarts on a restartable error and surfaces the fatal sentinels
(ErrHotVolumeLost, ErrFirstStartNoTip).

Boundaries are injected (DaemonOptions.BuildBoundaries) so the whole flow is
unit-tested against fakes without captive core or a real object store. The
production builder wires the captive-core CoreStreamOpener seam and a
LedgerBackend-backed NetworkTip/BackendWaiter adapter (backendTip). A thin
full-history-streaming cobra subcommand launches it from cmd/stellar-rpc.

Deferred to #772 (the SQLite -> full-history cutover): the captive-core
CaptiveCoreConfig plumbing (binary path, passphrase, archive URLs) and the lake
tip resolution are still entangled with the v1 daemon config, and ServeReads is
a no-op until the read path flips. The injected interfaces are final; only the
config plumbing is deferred, with TODO(#772) markers at each flip point. The v1
SQLite ingestion/preflight path in cmd/.../internal/daemon is untouched.
…path

Layout derived every artifact and hot path from a single DataDir root,
ignoring the [meta_store]/[immutable_storage.*]/[streaming.hot_storage]
overrides that ResolvePaths applies and LockRoots flocks. An operator
setting e.g. [streaming.hot_storage].path flocked the override dir while
the daemon wrote the only copy of recently-ingested ledgers under
{DataDir}/hot — the override was silently ignored and the single-process
flock guarded the wrong location.

Make Layout the single source of truth for storage paths: hold one root
per artifact tree (meta/hot/ledgers/events/txhash_raw/txhash_index) and
add NewLayoutFromPaths(paths) to bind those roots from the RESOLVED Paths.
NewLayout(root) is kept as the all-under-one-dir convenience (identical to
the no-override resolve). daemon.go, audit.go, and recovery.go now bind via
NewLayoutFromPaths(paths) so the locked roots and the data location are the
same. audit's filesystem walks read the per-tree roots off the Layout
instead of recomposing them from a removed Root().

Add TestRunDaemon_StoragePathOverridesHonored: with every tree overridden
onto a distinct mount it asserts the bound Layout resolves under the
overrides and that opening a hot DB via openHotTierForChunk lands under the
hot override with nothing under {DataDir}/hot (fails against the old
DataDir-derived Layout).
…ag test coverage

The IngestionLag doc-comment promised the ingestion loop refreshes the lag
gauge at each chunk boundary, but runIngestionLoop never calls IngestionLag —
its sole call site is catchUp. Once catch-up converged, ingestion_lag_ledgers
froze at its last catch-up value for the daemon's whole ingesting life, and the
watermark gauge only moved on a chunk-boundary tick (~LedgersPerChunk apart), so
there was no moving health signal between boundaries.

- Add a LastCommitted(seq) gauge (last_committed_ledger), refreshed per ledger
  in runIngestionLoop after each synced WriteBatch — a real per-ledger liveness
  signal that detects a wedged/slow ingester between chunk boundaries. The loop
  holds no network tip, so it deliberately does NOT touch IngestionLag.
- Correct the IngestionLag doc to state it is a catch-up-only signal that
  freezes by design once catch-up converges; point operators at LastCommitted.
- Tests: assert the boundary test moves last_committed (==last ledger, once per
  ledger) and that the loop never touches IngestionLag; add log-capture tests
  (logger.StartTest) asserting the ingestion-boundary and lifecycle freeze/
  snapshot log lines' structured keys, values, and levels — the commit had zero
  log assertions before.
…e-4 tick is a no-op

Strengthen the crash-injection/convergence suite's self-documentation after an
adversarial mutation-testing review:

- Add a CAVEAT to the suite header recording which cases genuinely exercise
  convergence (reach the tick from a DIRTY audit state and mutate durable keys)
  vs the one deliberate no-op, HotVolumeLossCase4, whose convergence value is the
  ErrHotVolumeLost fatal + watermark healing rather than tick repair. Also note
  INV-1 is asserted only structurally here (deep byte-compare is audit_test.go's).
- Strengthen HotVolumeLossCase4: assert the post-recovery store is ALREADY
  INV-1..4 clean before the tick and that the tick is a verified key-level no-op,
  making the 'recovery is pure key demotion' claim load-bearing.

Verified green with -race.
Drive the whole streaming-daemon lifecycle in one process against the real
stores and a FAKE/synthetic ledger source through the true entrypoint
(RunDaemonWith): first start (config load -> per-root flock -> validateConfig
pins the genesis floor -> supervised startStreaming) -> direct ingest across
two real chunk boundaries -> the lifecycle ticks freeze each just-closed
chunk's cold artifacts, fold its terminal txhash index, and discard its hot
tier -> a getTransaction-style hash->seq lookup resolves from the cold .idx
(frozen chunk) AND from the live hot CF (un-frozen live chunk) -> clean
shutdown -> RESTART re-derives the watermark and resumes captive core at
watermark+1 with no gap -> a retention_chunks=1 run prunes the now-past-floor
chunk (pruned coverage => not-found) while the floor chunk survives -> finish
with Catalog.Audit (INV-1..4) => Clean.

The ledger SOURCE is the only thing faked: captive core and the bulk backend
cross their injected interfaces (CoreStreamOpener / NetworkTipBackend), fed
well-formed synthetic LedgerCloseMeta built from the merged-store fixtures
(one-tx LCM where a real network-hashed tx hash is needed). A full captive-core
+ docker-stellar-core E2E is a documented follow-up requiring infra not
available here (the integrationtest harness + the #772 read cutover).

Also: exclude the per-root flock file (lockFileName) from the audit's INV-3
orphan-file walk so an audit of a real (or cleanly-stopped) deployment whose
storage roots hold the daemon's own locks is not falsely flagged.
…ectations

Pin the tx-hash cold-index format the streaming rebuild produces to the
merged #728/#780 cold path, and record the design's Part-4 perf figures.

perf_test.go:
- TestStreamingRebuild_ByteIdenticalToColdPath builds the SAME coverage via
  the streaming buildTxhashIndex and a direct txhash.BuildColdIndex over the
  same .bin inputs, asserting the two .idx files are byte-identical -- the
  precondition that lets the bench-fullhistory figures transfer.
- TestStreamingBin_MatchesSpecFormat / TestStreamingIdx_MatchesSpecFormat pin
  the on-disk formats to gettransaction sec 6.1/6.2 (16-byte key, 3-byte
  payload offset from MinLedger, 1-byte fingerprint, uint64-LE count header,
  [MinLedger,MaxLedger] metadata).
- TestColdIndexSizing_ConsistentWithPart4 asserts a B/tx sanity band around
  the design's ~4.2 B/tx and the inviolable 4 B/tx payload+fingerprint floor.

PERF.md records the expected figures (~1-min dense-window rebuild, ~4.2 B/tx
index, ~60 GB .bin floor) and points at bench-fullhistory on rpc-hack as the
measurement source -- transferred because the formats are byte-identical, not
re-measured here.


Align the execution layer to the simplified design (c586667):

1. Lifecycle notification: chan struct{} doorbell -> chan ChunkID (depth 8).
   Ingestion sends the just-completed chunk id at each boundary; lifecycleLoop
   drains to the most-recent and resolves up to it; a FULL buffer fatals
   ("lifecycle fell N boundaries behind ingestion").
2. Done-channels: SUCCESS semantics. A chunk build closes its channel only
   after its artifacts are durable; a build that exhausts retries leaves the
   channel open and returns the error (cancelling gctx). Dependent index builds
   unblock via <-gctx.Done() and bail. buildTxhashIndex's .bin precondition is
   kept as a cheap defensive backstop.
3. Ingestion loop: indexed poll (core.GetLedger(ctx, seq)) instead of a
   RawLedgers stream. Injected CoreStreamOpener/LedgerStream -> CoreOpener/
   LedgerGetter. The clean-shutdown-vs-crash distinction moved to the daemon top
   level (ctx-cancelled = clean). Per-ledger one-atomic-synced-WriteBatch and the
   boundary CLOSE-before-create-next-key ordering are unchanged.
4. Progress: deriveCompleteThrough + deriveWatermark consolidated into one
   lastCommittedLedger(cat[, probe]), preserving the cold/positional terms, the
   earliest-1 clamp, and the chunk -1 sentinel exactly.
5. validateRangeProducible: the standalone pre-flight gate is removed; an
   unproducible chunk still fatals via backfillSource's per-chunk source
   selection / bounded wait.
6. Hot-volume-loss: detected lazily on the open that needs the DB (no eager
   all-ready-keys dir scan); a ready-but-won't-open hot DB still surfaces
   ErrHotVolumeLost with surgical-recovery guidance.
7. INV-4 audit: a frozen index key whose window straddles the floor (stale lo
   below the floor, hi at/above) is NOT a violation; a key wholly below the floor
   still is.

Whole fullhistory/streaming tree green; -race green on the changed concurrency.
The streaming package's full-suite go-test budget (~815s with the six
full-chunk-ingesting tick/convergence tests plus the non-short E2E, all
serial) exceeded the fixed 600s go-test timeout the gate runs under the
literal command (no -short, no -timeout). This was an environmental/suite-
budget issue, not a logic regression: every test passes given adequate time.

Fix, test-only and assertion-preserving (no production code touched, no
assertion weakened):
  - Mark the six heavy full-chunk-ingesting tick/convergence tests
    t.Parallel(). Each uses its own t.TempDir()/Catalog (and per-instance
    logger), so there is no shared package state; they overlap safely and
    stay green run together.
  - Remove an untracked local scratch timing test (zz_timing_test.go) that
    ingested a full 20k-ledger range with zero assertions (pure t.Logf
    instrumentation) and was never part of the committed suite.

With these, the literal gate command
  go test -count=1 ./cmd/stellar-rpc/internal/fullhistory/streaming/
completes well under the 600s default (460s internal, EXIT=0) with the
non-short E2E running and passing.

The two minor review items (the documented pull-seam narrowing of the
two-boundary ordering assertion in TestRunIngestionLoop_ReportsChunkBoundaries,
and the end-state-only assertion in TestLifecycleLoop_DrainsToMostRecent) were
explicitly accepted as-is in the review and are left unchanged.
…eration; widen e2e boundary-cross budget

- startup.go: tie the lifecycle goroutine to a per-iteration child ctx and
  cancel+join it on every startStreaming return path. superviseStreaming
  restarts startStreaming on the live daemon ctx after a restartable error, so a
  daemon-ctx-tied lifecycle loop would leak (blocked on the old channel) or run a
  tick CONCURRENTLY with the next iteration's lifecycle+ingestion -- two
  RunColdChunk passes truncating the same .pack/.idx. Restores the design's
  single-lifecycle-goroutine invariant.
- e2e_test.go: raise the both-boundaries-crossed Eventually budget 180s->600s.
  Crossing both boundaries is ~20k synced per-ledger WriteBatches racing the
  lifecycle freezes; fsync throughput is highly variable under -race + the
  package's parallel full-chunk ticks. Assertion unchanged.
…en clean-shutdown budget

Max-effort review panel (0 blockers/majors) flagged minor doc drift and one
latent test-timeout inconsistency; no production behavior change:
- startup.go: scope the lifecycle cancel+join defer comment to the paths it
  actually covers (the pre-defer error paths return before the goroutine starts).
- audit.go / progress.go: deriveCompleteThrough is now a test-only shim; point the
  doc-comments at the production lastCommittedLedger/completeThrough chain.
- e2e_test.go: widen waitClean 20s->60s. Post-cancel shutdown joins one in-flight
  lifecycle unit (unpreemptible freeze Finalize fsync + index build), slow under
  -race + contention -- matching d587b06's boundary-cross budget reasoning.
…ability (reference)

PR reference only, no code change:
- design-docs/full-history-implementation-issues.md: the 20-issue breakdown of the
  streaming daemon design (c586667), mapped to #722 / #777.
- design-docs/full-history-implementation-status.md: per-issue traceability from that
  breakdown to the code on this branch (status / files / tests), incl. the Issue 13
  v1-retirement deferral to #772 and composed deps (#765, #728/#729 + the #794 read
  counterpart, #764, stores, read-path #770/#772/#774).
…rouped filenames, split audit.go

No behavior change; pure organization. Kept as ONE package rather than
sub-packaging because the crash-injection hooks fire from INSIDE the real
catalog/protocol/sweep/ingest methods, so those must share a package to stay
package-private and keep the invariant tests meaningful.

- doc.go: new package architecture map (file -> layer), relocated from keys.go's
  package comment and expanded into a foundation -> catalog -> {config, freeze
  engine, ingestion} -> orchestration -> operability guide.
- Layer-grouped filenames (git mv, content unchanged):
    protocol.go -> catalog_protocol.go;  sweep.go -> catalog_sweep.go
    validate.go -> config_validate.go (+test);  lock.go -> config_lock.go (+test)
    build.go -> txindex.go (+test)
- Split the 853-line audit.go: types + Audit driver + RunAudit stay; the four
  invariant walks (INV-1..4) + filesystem helpers move to audit_invariants.go.
- gofmt: fix pre-existing unclean formatting in convergence_test.go /
  lifecycle_test.go / observability_test.go (whitespace only).
chowbao added 2 commits June 19, 2026 13:28
…ices.Sort, drop stale comment

Quality cleanup from a 4-angle review (reuse / simplify / efficiency / altitude);
no behavior change, full fullhistory suite green:
- eligibility.go: the prune scan now uses RetentionGate.ChunkBelowFloor /
  WindowBelowFloor -- the same 'past retention' predicate the discard scan and
  the read path already share -- instead of a hand-rolled -1 sentinel +
  lastCompleteChunkAt(floor-1) / IDFromLedger(floor). One source of truth;
  verified behavior-identical including the genesis-floor sentinel.
- audit_invariants.go: sort.Slice -> slices.Sort for []WindowID (drops the
  package's only sort import; matches sibling files).
- ingest/driver.go: drop a stale comment referencing buildHotIngesters, removed
  by the decision-(a) storage rework.
…ct + advisory log

Addresses a review finding; NO behavior change. Auto-extending the hot demotion
to the live chunk was considered and REJECTED — it breaks the deliberate, tested
ability to demote a hot sub-range below the live chunk without disturbing the
watermark (TestSurgicalRecovery_DemotionBelowLiveLeavesWatermarkUnchanged,
TestSurgicalRecovery_DemotesColdIndexAndHot). The precise [Lo,Hi] demotion is
intended; the gap was operator awareness, not behavior.

- RecoveryRequest doc + runbook: make explicit that the last-committed-ledger
  derivation is the MAX over "ready" hot chunks, so re-ingesting a tainted HOT
  chunk requires Hi to reach the live chunk; a sub-range whose Hi stops below it
  intentionally leaves the higher ready chunks (and the watermark) in place.
- RunSurgicalRecovery: log an informational note when a hot demotion stops below
  the live chunk (best-effort, read-only) so an operator who meant to re-ingest
  learns to extend Hi. The legitimate sub-range demotion is unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant