Skip to content

feat: Framework-controlled persistence with batch processing #158

Description

@Mercurial

Summary

Currently, each reducer is responsible for calling SaveChangesAsync() at the end of RollForwardAsync. This creates a 1:1 relationship between blocks processed and database writes, which is inefficient for high-throughput sync scenarios.

Current Behavior

// Each reducer does this:
public async Task RollForwardAsync(Block block)
{
    // Process block...
    dbContext.Add(entity);
    await dbContext.SaveChangesAsync();  // ← Every block = 1 DB write
}

At ~10 blocks/second during sync, this means 10 database round-trips per second per reducer. With multiple reducers, this multiplies.

Proposed Behavior

The framework (CardanoIndexWorker) should control when persistence happens, not the individual reducers.

// Reducer just stages changes:
public async Task RollForwardAsync(Block block, DbContext dbContext)
{
    // Process block...
    dbContext.Add(entity);
    // No SaveChangesAsync - framework handles this
}

// Framework controls persistence:
foreach (var block in blocks)
{
    await reducer.RollForwardAsync(block, dbContext);
    
    if (ShouldFlush(blockCount, timeSinceLastFlush, isAtTip))
    {
        await dbContext.SaveChangesAsync();
    }
}

Benefits

  1. Batching - Save every N blocks (e.g., 100 blocks = 100x fewer DB round-trips)
  2. Transaction boundaries - Framework can wrap batches in transactions for atomic rollback
  3. Cleaner separation of concerns - Reducers handle "what changes", framework handles "when to persist"
  4. Shared DbContext - Dependent reducers in same chain could share one context per batch
  5. Configurable flush strategy - Flush on block count, time interval, reaching tip, or before rollback

Implementation Plan

Phase 1: Core Infrastructure

  • Add BatchSize configuration option (default: 100)
  • Add FlushIntervalMs configuration option (default: 5000)
  • Modify IReducer<T> interface to optionally not require SaveChanges
  • Update CardanoIndexWorker.ProcessRollforwardAsync to batch saves

Phase 2: DbContext Management

  • Create scoped DbContext per batch instead of per block
  • Implement flush triggers:
    • Block count threshold reached
    • Time interval exceeded
    • Tip reached (always flush when caught up)
    • Before processing rollback
  • Add transaction wrapper for batch atomicity

Phase 3: Dependent Reducer Optimization

  • Share DbContext across reducer dependency chains
  • Ensure parent reducer changes are visible to dependents within same batch
  • Handle partial batch failures gracefully

Configuration Example

{
  "Sync": {
    "Batch": {
      "Size": 100,
      "FlushIntervalMs": 5000,
      "FlushOnTip": true
    }
  }
}

Backward Compatibility

  • Existing reducers that call SaveChangesAsync() should continue to work
  • Framework-level batching can be opt-in initially via configuration
  • Deprecation warning for reducers calling SaveChanges when batching is enabled

Performance Impact

Metric Current With Batching (100)
DB writes/sec (syncing) ~10/reducer ~0.1/reducer
Latency per block ~50-100ms ~1-5ms
Throughput ~10 blocks/sec ~100+ blocks/sec

Related

  • Reducers will need to be updated to remove SaveChangesAsync() calls
  • Rollback handling needs review to ensure batch boundaries are respected

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions