ci: restructure workflows to use unified builder image on self-hosted runners by jackluo923 · Pull Request #112 · y-scope/presto

jackluo923 · 2025-11-28T18:00:54Z

Summary

Restructure CI workflows to use a unified builder image on self-hosted runners, replacing the branch-specific CI in release-0.293-clp-connector-snapshot.

Comparison: Three CI Approaches

Aspect	Upstream (prestodb/presto)	release-0.293-clp-connector-snapshot	This PR (ci-snapshot)
Runners	GitHub-hosted (ephemeral)	GitHub-hosted (ubuntu-22.04)	Self-hosted (ephemeral)
CI Structure	Separate independent workflows	Separate independent workflows	Unified ci.yml orchestrator
Builder Image	presto-native-dependency (C++ deps only, no caches)	Hardcoded from upstream Docker Hub: `prestodb/presto-native-dependency:0.293-...` (C++ deps only, no caches)	unified-builder (C++ deps + Maven deps + pre-warmed ccache, built on GitHub runner base)
Image Tag Strategy	Hash-based, manual update	Branch name only (e.g., `release-0.293-clp-connector-snapshot`)	Version + timestamp + hash (e.g., `0.293-BETA-20250522140509-484b00e`)
ccache Strategy	Stash/restore via Apache Infra	Stash/restore via Apache Infra	Pre-warmed in builder image
Branch Support	Any branch	Only `release-0.293-clp-connector-snapshot*` branches	Any branch
Image Building	On release only	Build from scratch (~1hr)	Download artifacts + package (~5min)

Design Improvements

1. Self-Hosted Runners with Pre-Built Dependencies

Switching from GitHub-hosted to self-hosted ephemeral runners with a unified builder image:

Aspect	GitHub-hosted (Current)	Self-hosted + Builder Image (This PR)
Resources	Limited (7GB RAM, 2 CPU)	Dedicated hardware (more CPU/RAM)
Dependencies	Install each run	Pre-installed in builder image
ccache	Restore from Apache stash	Pre-warmed in builder image
Maven deps	Download each run	Pre-downloaded in builder image
Build time	Slower (cold start)	Faster (warm cache)

Benefits:

Dedicated hardware with more resources for faster builds
Pre-warmed caches eliminate cold start overhead
Self-contained: no external stash service dependencies

2. Branch-Agnostic CI

The old CI in release-0.293-clp-connector-snapshot only works for that specific branch:

# Old: Branch-locked
push: ${{ startsWith(github.ref, 'refs/heads/release-0.293-clp-connector-snapshot') }}

The new CI works on any branch, enabling feature branch testing and multi-version support.

3. Automatic Version Inference

Images are automatically tagged with meaningful versions extracted from pom.xml:

Immutable tag: <version>-<TYPE>-<timestamp>-<hash> (e.g., 0.293-BETA-20250527143021-62303d8)
SNAPSHOT tag: <version>-<TYPE>-SNAPSHOT (e.g., 0.293-BETA-SNAPSHOT)

No manual version management required. Each commit produces a uniquely identifiable image.

4. Configurable Version Type

Single configuration point for image version type:

env:
  IMAGE_VERSION_TYPE: 'BETA'  # Options: RELEASE, BETA, DEV

Applied consistently to both presto and prestissimo images.

5. Unified Builder Image with Hash-Based Tagging

The builder image is fundamentally different from upstream's presto-native-dependency:

Aspect	Upstream presto-native-dependency	Our unified-builder
Base	CentOS/Ubuntu minimal	GitHub Actions runner image
C++ deps	✓ Installed	✓ Installed
Maven deps	✗ Not included	✓ Pre-downloaded
ccache	✗ Empty	✓ Pre-warmed (~90% hit rate)
Node.js/Yarn	✗ Not included	✓ Installed
Docker layer cache	✗ No	✓ Optimized layers

The builder image tag is computed from dependency file hashes:

Only rebuilt when dependencies actually change
Eliminates hardcoded image versions that drift out of sync
Self-contained: no external stash service dependencies

Performance Benefits

Metric	Old CI (snapshot branch)	New CI	Improvement
C++ Build	Cold build + Apache stash restore	Pre-warmed ccache (~90% hit rate)	~50% faster
Maven Build	Download deps each run	Pre-downloaded in builder image	~30% faster
Image Building	Build from scratch (~60 min)	Download artifacts + package (~5 min)	~90% faster
Dependency Install	Per-workflow installation	Pre-built unified image	Eliminated

ccache Pre-warming

The builder image contains a pre-warmed ccache from a previous successful build. Incremental builds achieve high cache hit rates, dramatically reducing C++ compilation time.

Artifact-Based Image Building

Instead of rebuilding prestissimo from scratch for the Docker image:

Old: checkout → install deps → cmake → ninja → docker build (~60 min)
New: download artifact → docker build (~5 min)

Limitations of Current CI (release-0.293-clp-connector-snapshot)

Branch Lock-in: Only works for release-0.293-clp-connector-snapshot* branches
- Feature branches cannot produce Docker images
- Cannot easily test CI changes on other branches
Hardcoded Dependencies: Builder image version is hardcoded to upstream's prestodb/presto-native-dependency:0.293-... from Docker Hub. This image is built by upstream prestodb/presto CI and requires manual updates when dependencies change. The snapshot branch is pinned to an older 0.293 version while upstream master has moved to 0.296+
No Version Tracking: Images tagged only by branch name, no timestamp/hash for traceability
Slow Image Builds: Rebuilds everything from scratch for Docker images (~1 hour)
External Service Dependency: Relies on Apache Infrastructure stash service for ccache
Limited Resources: GitHub-hosted runners have limited CPU/RAM compared to dedicated hardware

Maintenance Benefits

Self-Contained: No external stash service dependencies
Auto-Updating Builder: Hash-based tags ensure builder image stays in sync with dependencies
Reproducible: Immutable image tags enable exact environment recreation
Documented: Comprehensive inline documentation for novice developers
Configurable: Single-point configuration for version type and Java version

Job Dependency Graph

compute-builder-tag ─► create-builder-image ─┬─► prestocpp ────┬─► integration-tests
                                             ├─► presto ───────┘
                                             └─► presto-tests

prestocpp: uploads presto-native-build artifact, builds prestissimo image
presto: uploads presto-server/presto-cli artifacts, builds presto image

Artifacts and Images

Artifacts (1-day retention)

Artifact	Contents
presto-server	presto-server-*.tar.gz
presto-cli	presto-cli-*-executable.jar
presto-native-build	presto_server, velox_functions_remote_server_main

Docker Images (ghcr.io)

Image	Description
unified-builder	Build environment with all dependencies
presto	Java coordinator runtime
prestissimo	C++ native worker runtime

Summary by CodeRabbit

Release Notes

New Features
- Added YAML-based metadata support for CLP plugin configuration
- Introduced Pinot storage backend support for CLP splits and metadata
- Enabled Top-N query optimization pushdown for improved performance
Bug Fixes
- Updated time-zone handling for date and time literal edge cases
Chores
- Updated Joda-Time dependency to 2.13.1
- Added SnakeYAML version constraint for improved security and compatibility

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ilds Change Docker image tagging from static :dev tags to dynamic branch-based tags, and update all workflow conditions to accept any branch with the release-0.293-clp-connector-snapshot prefix. Changes: - Docker image tags now use actual branch name via type=ref,event=branch * Before: ghcr.io/repo/prestissimo-worker:dev * After: ghcr.io/repo/prestissimo-worker:release-0.293-clp-connector-snapshot-nov29 - Push conditions changed from exact match to prefix match using startsWith() - Cancel-in-progress logic updated to exclude all snapshot prefix branches - PR title checks now accept release-0.293-clp-connector-snapshot* pattern Example: Pushing to branch "release-0.293-clp-connector-snapshot-nov29" will: - Build and push images tagged with :release-0.293-clp-connector-snapshot-nov29 - Run all CI jobs to completion without cancellation - Enable PR title validation for PRs targeting this branch This allows multiple snapshot branches to coexist with unique image tags.

Add alternative provider implementations for metadata and split discovery, enabling flexible deployment configurations without MySQL dependency. New providers: - ClpYamlMetadataProvider: Reads table/column metadata from YAML config files instead of querying MySQL database. Supports nested schemas and polymorphic types through hierarchical YAML structure. - ClpPinotSplitProvider: Queries Apache Pinot database for archive file paths via HTTP/JSON API instead of MySQL. Determines split types based on file extensions (.clp.zst for IR, otherwise ARCHIVE). Configuration: - Add clp.metadata-provider-type=YAML option for YAML-based metadata - Add clp.split-provider-type=PINOT option for Pinot-based split discovery - Add clp.metadata-yaml-path config for YAML metadata file location The YAML metadata provider parses a two-level structure: 1. Main metadata file: maps connector → schema → table → schema_file_path 2. Individual table schema files: define column names and types This provides deployment flexibility - users can now choose between: - Database-driven (MySQL for both metadata and splits) - Hybrid (YAML metadata + Pinot splits) - Or any combination based on their infrastructure Dependencies: Adds jackson-dataformat-yaml and snakeyaml 2.1

Ensure all timestamp literals are converted to nanoseconds before generating KQL and SQL filters for pushdown, regardless of their original precision (milliseconds, microseconds, etc.). Problem: Presto supports multiple timestamp precisions (TIMESTAMP with millisecond precision, TIMESTAMP_MICROSECONDS, etc.), but CLP's query engine expects timestamps in a consistent nanosecond format. Previously, timestamp literals were passed through without precision normalization, causing incorrect results when different timestamp types were used in queries. Solution: Add timestamp normalization logic in ClpFilterToKqlConverter: - tryEnsureNanosecondTimestamp(): Detects TIMESTAMP and TIMESTAMP_MICROSECONDS types - ensureNanosecondTimestamp(): Converts timestamp values to nanoseconds using: 1. Extract epoch seconds via TimestampType.getEpochSecond() 2. Extract nanosecond fraction via TimestampType.getNanos() 3. Convert to total nanoseconds: SECONDS.toNanos(seconds) + nanosFraction Applied to: - BETWEEN operator: Both lower and upper bounds - Comparison operators: All timestamp literal values (=, !=, <, >, <=, >=) This ensures consistent timestamp representation in both KQL filters (sent to CLP engine) and SQL filters (used for metadata filtering), regardless of the timestamp precision used in the original Presto query. Example: Query: WHERE timestamp BETWEEN TIMESTAMP '2025-01-01' AND TIMESTAMP '2025-01-02' Before: Literal values passed as milliseconds or microseconds After: Both bounds normalized to nanosecond representation

Implement a plan optimizer that pushes down LIMIT + ORDER BY operations to reduce split scanning when querying on timestamp metadata columns. The optimizer detects TopN patterns in query plans and uses archive metadata (timestamp bounds, message counts) to select only the minimum set of archives needed to guarantee correct results. Key features: - Detects and rewrites TopN → [Project] → [Filter] → TableScan patterns - Pushes TopN specifications through ClpTableLayoutHandle - Handles overlapping archives by merging into non-overlapping groups - Supports both ASC (earliest-first) and DESC (latest-first) ordering - Uses worst-case analysis to ensure correctness when archive ranges overlap Implementation: - ClpTopNSpec: Data structure for TopN specifications (limit + orderings) - ClpComputePushDown: Enhanced plan optimizer with TopN detection - ClpMySqlSplitProvider: Archive selection logic using metadata queries - Comprehensive test coverage for various TopN scenarios Performance: For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100", this eliminates scanning of archives that cannot contain the top results, significantly reducing I/O.

@JsonProperty

Implement TopN pushdown optimization in ClpPinotSplitProvider to minimize archive scanning when executing LIMIT + ORDER BY queries on timestamp metadata columns. The optimization uses archive metadata (timestamp bounds and message counts) to intelligently select only the minimum set of archives needed to guarantee correct TopN results, significantly reducing I/O for time-range queries. Implementation details: Archive selection algorithm: - Fetches archive metadata (file path, creation time, modification time, message count) - Merges overlapping archives into non-overlapping groups by timestamp ranges - For DESC ordering: includes newest group + older groups until limit is covered - For ASC ordering: includes oldest group + newer groups until limit is covered - Uses worst-case analysis to ensure correctness when archives overlap Code structure: - ClpPinotSplitProvider.listSplits(): Detects TopN specs and routes to optimized path - selectTopNArchives(): Implements the archive selection algorithm - toArchiveGroups(): Merges overlapping archives into logical components - ArchiveMeta: Represents archive metadata with validation - ArchiveGroup: Represents merged archive groups for overlap handling ClpTopNSpec enhancements: - Made fields private with proper encapsulation - Added @JsonProperty annotations to getters for correct serialization - Maintains immutability for thread-safety Code quality improvements: - Proper exception handling with context-aware error messages - Input validation in constructors (bounds checking, null checks) - Extracted determineSplitType() helper to eliminate duplication - Made fields final where immutable (pinotDatabaseUrl, ArchiveMeta fields) - Improved logging with table names and SQL queries for debugging - Better encapsulation: private fields, getters-only access Performance impact: For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100", this eliminates scanning of archives outside the time range of interest, providing substantial I/O reduction for large archive sets.

…ability.

Implements dynamic schema discovery from YAML metadata files, allowing the CLP connector to support multiple schemas beyond just "default" (e.g., clp.dev.table, clp.prod.table). Key changes: - Add default listSchemaNames() method to ClpMetadataProvider interface for DRY principle - Implement dynamic schema discovery in ClpYamlMetadataProvider with thread-safe caching - Update ClpMetadata to delegate schema listing to metadata providers - Add comprehensive error handling with graceful fallback to default schema - Optimize performance with double-checked locking and ObjectMapper reuse - Add 15+ unit tests covering all edge cases and error scenarios - Fix checkstyle violations in UberClpPinotSplitProvider and ClpPinotSplitProvider The implementation is backward compatible - existing single-schema setups continue to work without configuration changes. Multi-schema support is activated automatically when the YAML metadata file contains multiple schema entries. Example YAML structure: ```yaml clp: default: logs: /path/to/default/logs.yaml dev: test_logs: /path/to/dev/logs.yaml prod: production_logs: /path/to/prod/logs.yaml ``` Performance optimizations: - Thread-safe caching prevents repeated YAML parsing - Double-checked locking pattern for lazy initialization - Reused ObjectMapper instances reduce object creation overhead - Synchronized table schema map updates ensure thread safety Testing: - All 11 tests in TestClpYamlMetadataProvider pass - All 4 tests in TestClpMultiSchema pass - Fixed testListSchemaNamesNoCatalogField to handle missing catalog gracefully

…lConverter (#90)

…test failures (#103)

…restodb#26254) ## Description Fix flakytests that uses `experimental.spiller-spill-path` as `/tmp/presto/spills/` in query runner ## Motivation and Context prestodb#25890 ## Impact  ## Test Plan  ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```

Upgrade joda-time to 2.13.1 to support new timezone data including America/Coyhaique. Update tests to use dates where timezone gaps still exist in the updated tzdata: - testLocallyUnrepresentableDateLiterals: 1970-01-01 -> 1932-04-01 - testLocallyUnrepresentableTimeLiterals: use 2017-04-02 or 2012-04-01 Cherry-picked from upstream: prestodb/presto@0a504d20e6

… runners Key changes: - Add comprehensive CI architecture documentation explaining: - Terminology (presto, prestocpp, prestissimo) - Unified builder image strategy for ephemeral runners - Job dependency graph - Comparison with upstream (prestodb/presto) CI - Performance benefits of pre-warmed ccache and Maven deps - Consolidate presto-java8/java17 jobs into single matrix-based `presto` job - ARTIFACT_JAVA_VERSION controls which version uploads artifacts/images (default: '8') - Add prestissimo image building to prestocpp workflow - Downloads artifacts from build job, packages into runtime image - Uses same tagging strategy as presto image (immutable + SNAPSHOT tags) - Centralize IMAGE_VERSION_TYPE configuration (set to 'BETA') - Applied to both presto and prestissimo images - Document artifacts (presto-server, presto-cli, presto-native-build) and Docker images (unified-builder, presto, prestissimo)

jackluo923 · 2025-11-28T18:01:22Z

Duplicate, updating PR #110 instead

coderabbitai · 2025-11-28T18:03:09Z

Caution

Review failed

The pull request is closed.

Walkthrough

This PR introduces a comprehensive CI/CD restructuring with a unified builder image strategy featuring dependency-based caching, extensive CLP plugin enhancements including YAML metadata support and Pinot split providers with TopN pushdown optimization, and test infrastructure improvements with UUID-based spill path isolation to prevent file collisions.

Changes

Cohort / File(s)	Summary
GitHub Actions CI Workflow Restructuring `.github/workflows/ci.yml`, `.github/workflows/compute-builder-tag.yml`, `.github/workflows/create-builder-image.yml`, `.github/workflows/presto-build.yml`, `.github/workflows/prestocpp-linux-build-and-unit-test.yml`, `.github/workflows/tests.yml`, `.github/workflows/integration-tests.yml`	New unified builder image CI pipeline with dependency-based tag computation and conditional image building; orchestrates multi-job builds for Java (presto) and C++ (prestocpp) components with shared pre-warmed caches; consolidates Maven/Node.js dependency pre-downloading.
Workflow File Removals & Updates `.github/workflows/maven-checks.yml` (removed), `.github/workflows/prestissimo-worker-images-build.yml` (removed), `.github/workflows/docs.yml`, `.github/workflows/pr-title-checks.yaml`, `.github/workflows/prestocpp-format-and-header-check.yml`	Removes legacy Maven-checks and prestissimo image workflows; updates concurrency rules to use prefix-based branch exclusion for snapshot branches.
Download & Docker Build Support `.github/bin/download_nodejs`, `.github/dockerfiles/yscope-presto-builder.dockerfile`	Adds configurable MAVEN_REPO support for cached Node.js/Yarn downloads; introduces unified builder Dockerfile with pre-warmed ccache, pre-downloaded dependencies (JDK, Node.js, Maven), and CMake setup for both presto and prestocpp.
CLP Plugin Configuration & Metadata `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java`	Adds metadataYamlPath configuration; extends enums (MetadataProviderType.YAML); introduces YAML-based metadata provider with schema/table discovery from YAML files; adds default listSchemaNames method to interface.
CLP Module Binding & Split Framework `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java`	Introduces map-based provider bindings for metadata/split/split-filter types; adds equals/hashCode to ClpSplit; updates listSchemaNames to delegate to provider.
CLP Pinot Split Providers `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java`	Adds Pinot-based split providers for standard and Uber variants; implements TopN archive selection with overlap merging; Uber variant customizes Pinot endpoint and metadata table naming.
CLP Split Filter Providers `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java`	Adds remapColumnName abstract method to API; updates checkContainsRequiredFilters to accept Set of push-down variables; introduces Pinot and Uber Pinot implementations; Uber variant adds TEXT_MATCH transformations for merged text index queries.
CLP TopN Pushdown Optimization `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java`	Adds ClpTopNSpec class modeling limit and ordering; extends ClpTableLayoutHandle with metadataQueryOnly and topN fields; implements extensive TopN visit handler in optimizer with ordering validation and metadata SQL generation.
CLP Expression & Filter Conversion `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java`	Adds pushDownVariables tracking to ClpExpression with new constructor overloads; propagates push-down state through all translation paths; adds nanosecond-precision timestamp normalization for KQL/metadata SQL generation.
CLP Optimization Refactoring `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java`	Consolidates subtree rewriting logic into visitor-based traversal; adds explicit TableScanNode handler; removes internal rewriting helpers.
CLP Test Resources `presto-clp/src/test/resources/test-.yaml`, `presto-clp/src/test/resources/test-.json`	Adds YAML test schemas for default/multiple schemas, orders, users, cockroachdb; adds JSON split-filter configurations for Pinot and TopN test scenarios.
CLP Test Classes `presto-clp/src/test/java/.../TestClpTopN.java`, `presto-clp/src/test/java/.../TestClpMultiSchema.java`, `presto-clp/src/test/java/.../TestClpYamlMetadata.java`, `presto-clp/src/test/java/.../TestClp*.java` (multiple)	Adds comprehensive test classes for TopN optimization, YAML metadata discovery, Pinot split providers (standard/Uber), split filter providers; includes schema/table/split discovery and transformation validation.
CLP Test Database Setup `presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java`	Extends archives table schema with messageCount column; updates INSERT/MERGE logic and ArchivesTableRow constructor.
Test Infrastructure UUID Updates `presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedSpilled.java`, `presto-tests/src/test/java/com/facebook/presto/tests/TestSpilled*.java`, `presto-thrift-connector/src/test/java/.../TestThriftDistributedQueriesIndexed.java`	Appends UUID to spill paths across multiple test classes to ensure per-run directory isolation and prevent file collisions.
Timezone Test Updates `presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestEngineOnlyQueries.java`, `presto-native-tests/src/test/java/.../TestDistributedEngineOnlyQueries.java`, `presto-tests/src/test/java/.../TestH2QueryRunner.java`	Updates test dates for unrepresentable time scenarios (1970-01-01 → 1932-04-01, 2012-04-01 → 2017-04-02); removes java.util.Objects import.
Runtime & Dependencies `pom.xml`, `presto-clp/pom.xml`, `presto-native-execution/pom.xml`, `presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile`, `presto-native-execution/velox`	Updates Joda-Time to 2.13.1; adds jackson-dataformat-yaml and SnakeYAML 2.1 upper-bound dependency management; refactors Prestissimo runtime Dockerfile to use pre-built binary and simplified runtime library extraction.
Test Configuration & Disabled Tests `presto-native-execution/src/test/java/.../TestPrestoNativeAsyncDataCacheCleanupAPI.java`	Disables two test methods via `enabled = false` annotation.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas requiring extra attention:

CI Workflow Orchestration (.github/workflows/ci.yml, compute-builder-tag, create-builder-image): Complex multi-job dependency graph with cross-workflow communication, caching strategy validation, and artifact flow; verify builder image tag computation is deterministic and cache hits occur as expected.
CLP Pinot Integration (ClpPinotSplitProvider, ClpUberPinotSplitProvider, related tests): New HTTP-based Pinot query execution with archive metadata fetching and TopN archive selection algorithm; validate correctness of overlap merging logic and Uber-specific endpoint/table naming conventions.
TopN Pushdown Logic (ClpComputePushDown, ClpTableLayoutHandle, ClpTopNSpec): Complex query plan rewriting with ORDER BY validation, metadata column remapping, and conditional metadata-only query generation; ensure ordering specification correctness and integration with existing filter/project pushdowns.
Push-down Variable Tracking (ClpExpression, ClpFilterToKqlConverter): Extensive refactoring of expression handling to propagate push-down variables through all translation paths; verify completeness of coverage across all expression types and correct aggregation in combined expressions (AND/OR/IN).
YAML Metadata Provider (ClpYamlMetadataProvider): File parsing, caching, error fallback behavior with Jackson YAML factory; validate thread-safety of lazy initialization and correct relative path resolution for nested schema definitions.
New Provider Bindings (ClpModule): Map-based binding strategy replacing conditionals; ensure all provider types are correctly mapped and error handling for unsupported types is consistent.
Test Infrastructure Changes: UUID-based spill path isolation across 7+ test files; verify UUID generation doesn't introduce flakiness and spill cleanup logic handles unique directories correctly.

Possibly related PRs

feat: Add support for defining and using specific columns for metadata filtering. #19: Modifies the CLP pushdown/filtering stack (ClpExpression and ClpFilterToKqlConverter) and the split-filter API (checkContainsRequiredFilters/remapping) with related code-level changes.

Suggested reviewers

kirkrodrigues

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ci-snapshot

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41aa302 and 62303d8.

📒 Files selected for processing (70)

.github/bin/download_nodejs (2 hunks)
.github/dockerfiles/yscope-presto-builder.dockerfile (1 hunks)
.github/workflows/ci.yml (1 hunks)
.github/workflows/compute-builder-tag.yml (1 hunks)
.github/workflows/create-builder-image.yml (1 hunks)
.github/workflows/docs.yml (1 hunks)
.github/workflows/integration-tests.yml (1 hunks)
.github/workflows/maven-checks.yml (0 hunks)
.github/workflows/pr-title-checks.yaml (1 hunks)
.github/workflows/prestissimo-worker-images-build.yml (0 hunks)
.github/workflows/presto-build.yml (1 hunks)
.github/workflows/prestocpp-format-and-header-check.yml (1 hunks)
.github/workflows/prestocpp-linux-build-and-unit-test.yml (2 hunks)
.github/workflows/tests.yml (2 hunks)
pom.xml (2 hunks)
presto-clp/pom.xml (2 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (3 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (5 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (6 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (6 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (18 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (5 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (5 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (1 hunks)
presto-clp/src/test/resources/test-cockroachdb-schema.yaml (1 hunks)
presto-clp/src/test/resources/test-orders-schema1.yaml (1 hunks)
presto-clp/src/test/resources/test-orders-schema2.yaml (1 hunks)
presto-clp/src/test/resources/test-pinot-split-filter.json (1 hunks)
presto-clp/src/test/resources/test-tables-schema.yaml (1 hunks)
presto-clp/src/test/resources/test-topn-split-filter.json (1 hunks)
presto-clp/src/test/resources/test-users-schema1.yaml (1 hunks)
presto-native-execution/pom.xml (1 hunks)
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile (1 hunks)
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java (2 hunks)
presto-native-execution/velox (1 hunks)
presto-native-tests/src/test/java/com/facebook/presto/nativetests/TestDistributedEngineOnlyQueries.java (1 hunks)
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestEngineOnlyQueries.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedQueriesIndexed.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedSpilledQueries.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestH2QueryRunner.java (1 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithHighMemoryRevokingThreshold.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithLargeBlockSpillingEnabled.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithPreprocessingEnabled.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledTopNQueries.java (2 hunks)
presto-thrift-connector/src/test/java/com/facebook/presto/connector/thrift/integration/TestThriftDistributedQueriesIndexed.java (2 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jackluo923 and others added 17 commits October 28, 2025 12:00

feat: Upgrade velox submodule to y-scope/velox@52bb2f (#85)

5a861d1

fix: Rewrite CLP UDFs across full plan tree (#86)

c8aad93

refactor: Make Pinot query endpoint URL construction overridable (#88)

e3082e5

Refactor: Auto-correct code style violations for consistency and read…

a3deb6d

…ability.

fix: Add metadata SQL generation for string literals in ClpFilterToKq…

3105485

…lConverter (#90)

feat: Add Pinot split filter providers with TEXT_MATCH support (#101)

519a803

refactor: Improve ClpYamlMetadataProvider cache organization and fix …

dd4dda1

…test failures (#103)

[native] Disable TestPrestoNativeAsyncDataCacheCleanupAPI

221dfe5

jackluo923 closed this Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: restructure workflows to use unified builder image on self-hosted runners#112

ci: restructure workflows to use unified builder image on self-hosted runners#112
jackluo923 wants to merge 17 commits into
release-0.293-clp-connectorfrom
ci-snapshot

jackluo923 commented Nov 28, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

jackluo923 commented Nov 28, 2025

Uh oh!

coderabbitai Bot commented Nov 28, 2025 •

edited

Loading

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jackluo923 commented Nov 28, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Comparison: Three CI Approaches

Design Improvements

1. Self-Hosted Runners with Pre-Built Dependencies

2. Branch-Agnostic CI

3. Automatic Version Inference

4. Configurable Version Type

5. Unified Builder Image with Hash-Based Tagging

Performance Benefits

ccache Pre-warming

Artifact-Based Image Building

Limitations of Current CI (release-0.293-clp-connector-snapshot)

Maintenance Benefits

Job Dependency Graph

Artifacts and Images

Artifacts (1-day retention)

Docker Images (ghcr.io)

Summary by CodeRabbit

Release Notes

Uh oh!

jackluo923 commented Nov 28, 2025

Uh oh!

coderabbitai Bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jackluo923 commented Nov 28, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 28, 2025 •

edited

Loading