Skip to content

Conversation

@wzhero1
Copy link

@wzhero1 wzhero1 commented Jan 13, 2026

Purpose

This PR implements parallel snapshot expiration to improve the performance of large-scale cleanup operations.

Motivation:

  • Serial file deletion becomes a performance bottleneck for tables with large amounts of data
  • Current implementation cannot leverage Flink's distributed computing capabilities

Changes:

  1. Core module refactoring - Split the original serial expiration logic into Planner/Executor architecture:

    • ExpireSnapshotsPlanner: Computes expiration plan including snapshot range and four types of tasks
    • ExpireSnapshotsExecutor: Executes deletion tasks based on task type
    • ExpireSnapshotsPlan: Data structure containing task lists and protection set
    • SnapshotExpireTask: Represents a single expiration task with TaskType enum
    • ProtectionSet: Immutable set of protected manifests and tagged snapshots
    • DeletionReport: Execution report for each task
  2. Flink Action parallel mode - ExpireSnapshotsAction supports --parallelism parameter:

    • Worker Phase: Parallel deletion of data files and changelog files using RangePartitionedExpireFunction
    • Sink Phase: Serial deletion of manifests and snapshot metadata using SnapshotExpireSink
    • Tasks are partitioned by snapshot ID range to maximize cache locality

Execution modes:

  • Serial mode (default): parallelism=null or ≤1 → Uses ExpireSnapshotsImpl
  • Parallel mode: parallelism>1 + --force_start_flink_job → Uses Flink distributed execution

Tests

Unit Tests:

  • ExpireSnapshotsPlanTest - Tests task partitioning logic (partitionTasksBySnapshotRange)
  • DeletionReportTest - Tests deletion report serialization
  • ExpireSnapshotsTest - Core expiration logic tests (refactored to use new Planner/Executor)

Integration Tests:

  • ExpireSnapshotsActionITCase - Flink parallel mode integration tests
  • ExpireSnapshotsProcedureITCase - Procedure integration tests

API and Format

New CLI parameters for expire-snapshots action:

Parameter Type Default Description
--parallelism Integer null Parallelism for parallel mode (requires >1)

No storage format changes.

Backward compatible: Default behavior (serial mode) remains unchanged.

Documentation

Yes, this introduces a new feature: parallel snapshot expiration.

Documentation should be added to describe:

  • New --parallelism parameter for expire-snapshots action
  • Requirement: parallel mode needs both --parallelism > 1 and --force_start_flink_job
  • Performance recommendations for large-scale cleanup

@wzhero1 wzhero1 force-pushed the feat/paimon-expire-snapshot-parallel-opt branch from 69b0a18 to f61b9b4 Compare January 13, 2026 11:13
@wzhero1 wzhero1 changed the title [flink] Support parallelism snapshot expire [core][flink] Support parallelism snapshot expire Jan 13, 2026
@wzhero1 wzhero1 force-pushed the feat/paimon-expire-snapshot-parallel-opt branch from f61b9b4 to be7a32b Compare January 13, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant