Skip to content

ci: restructure workflows to use unified builder image on self-hosted runners#112

Closed
jackluo923 wants to merge 17 commits into
release-0.293-clp-connectorfrom
ci-snapshot
Closed

ci: restructure workflows to use unified builder image on self-hosted runners#112
jackluo923 wants to merge 17 commits into
release-0.293-clp-connectorfrom
ci-snapshot

Conversation

@jackluo923
Copy link
Copy Markdown
Member

@jackluo923 jackluo923 commented Nov 28, 2025

Summary

Restructure CI workflows to use a unified builder image on self-hosted runners, replacing the branch-specific CI in release-0.293-clp-connector-snapshot.

Comparison: Three CI Approaches

Aspect Upstream (prestodb/presto) release-0.293-clp-connector-snapshot This PR (ci-snapshot)
Runners GitHub-hosted (ephemeral) GitHub-hosted (ubuntu-22.04) Self-hosted (ephemeral)
CI Structure Separate independent workflows Separate independent workflows Unified ci.yml orchestrator
Builder Image presto-native-dependency (C++ deps only, no caches) Hardcoded from upstream Docker Hub: prestodb/presto-native-dependency:0.293-... (C++ deps only, no caches) unified-builder (C++ deps + Maven deps + pre-warmed ccache, built on GitHub runner base)
Image Tag Strategy Hash-based, manual update Branch name only (e.g., release-0.293-clp-connector-snapshot) Version + timestamp + hash (e.g., 0.293-BETA-20250522140509-484b00e)
ccache Strategy Stash/restore via Apache Infra Stash/restore via Apache Infra Pre-warmed in builder image
Branch Support Any branch Only release-0.293-clp-connector-snapshot* branches Any branch
Image Building On release only Build from scratch (~1hr) Download artifacts + package (~5min)

Design Improvements

1. Self-Hosted Runners with Pre-Built Dependencies

Switching from GitHub-hosted to self-hosted ephemeral runners with a unified builder image:

Aspect GitHub-hosted (Current) Self-hosted + Builder Image (This PR)
Resources Limited (7GB RAM, 2 CPU) Dedicated hardware (more CPU/RAM)
Dependencies Install each run Pre-installed in builder image
ccache Restore from Apache stash Pre-warmed in builder image
Maven deps Download each run Pre-downloaded in builder image
Build time Slower (cold start) Faster (warm cache)

Benefits:

  • Dedicated hardware with more resources for faster builds
  • Pre-warmed caches eliminate cold start overhead
  • Self-contained: no external stash service dependencies

2. Branch-Agnostic CI

The old CI in release-0.293-clp-connector-snapshot only works for that specific branch:

# Old: Branch-locked
push: ${{ startsWith(github.ref, 'refs/heads/release-0.293-clp-connector-snapshot') }}

The new CI works on any branch, enabling feature branch testing and multi-version support.

3. Automatic Version Inference

Images are automatically tagged with meaningful versions extracted from pom.xml:

  • Immutable tag: <version>-<TYPE>-<timestamp>-<hash> (e.g., 0.293-BETA-20250527143021-62303d8)
  • SNAPSHOT tag: <version>-<TYPE>-SNAPSHOT (e.g., 0.293-BETA-SNAPSHOT)

No manual version management required. Each commit produces a uniquely identifiable image.

4. Configurable Version Type

Single configuration point for image version type:

env:
  IMAGE_VERSION_TYPE: 'BETA'  # Options: RELEASE, BETA, DEV

Applied consistently to both presto and prestissimo images.

5. Unified Builder Image with Hash-Based Tagging

The builder image is fundamentally different from upstream's presto-native-dependency:

Aspect Upstream presto-native-dependency Our unified-builder
Base CentOS/Ubuntu minimal GitHub Actions runner image
C++ deps ✓ Installed ✓ Installed
Maven deps ✗ Not included ✓ Pre-downloaded
ccache ✗ Empty ✓ Pre-warmed (~90% hit rate)
Node.js/Yarn ✗ Not included ✓ Installed
Docker layer cache ✗ No ✓ Optimized layers

The builder image tag is computed from dependency file hashes:

  • Only rebuilt when dependencies actually change
  • Eliminates hardcoded image versions that drift out of sync
  • Self-contained: no external stash service dependencies

Performance Benefits

Metric Old CI (snapshot branch) New CI Improvement
C++ Build Cold build + Apache stash restore Pre-warmed ccache (~90% hit rate) ~50% faster
Maven Build Download deps each run Pre-downloaded in builder image ~30% faster
Image Building Build from scratch (~60 min) Download artifacts + package (~5 min) ~90% faster
Dependency Install Per-workflow installation Pre-built unified image Eliminated

ccache Pre-warming

The builder image contains a pre-warmed ccache from a previous successful build. Incremental builds achieve high cache hit rates, dramatically reducing C++ compilation time.

Artifact-Based Image Building

Instead of rebuilding prestissimo from scratch for the Docker image:

Old: checkout → install deps → cmake → ninja → docker build (~60 min)
New: download artifact → docker build (~5 min)

Limitations of Current CI (release-0.293-clp-connector-snapshot)

  1. Branch Lock-in: Only works for release-0.293-clp-connector-snapshot* branches

    • Feature branches cannot produce Docker images
    • Cannot easily test CI changes on other branches
  2. Hardcoded Dependencies: Builder image version is hardcoded to upstream's prestodb/presto-native-dependency:0.293-... from Docker Hub. This image is built by upstream prestodb/presto CI and requires manual updates when dependencies change. The snapshot branch is pinned to an older 0.293 version while upstream master has moved to 0.296+

  3. No Version Tracking: Images tagged only by branch name, no timestamp/hash for traceability

  4. Slow Image Builds: Rebuilds everything from scratch for Docker images (~1 hour)

  5. External Service Dependency: Relies on Apache Infrastructure stash service for ccache

  6. Limited Resources: GitHub-hosted runners have limited CPU/RAM compared to dedicated hardware

Maintenance Benefits

  1. Self-Contained: No external stash service dependencies
  2. Auto-Updating Builder: Hash-based tags ensure builder image stays in sync with dependencies
  3. Reproducible: Immutable image tags enable exact environment recreation
  4. Documented: Comprehensive inline documentation for novice developers
  5. Configurable: Single-point configuration for version type and Java version

Job Dependency Graph

compute-builder-tag ─► create-builder-image ─┬─► prestocpp ────┬─► integration-tests
                                             ├─► presto ───────┘
                                             └─► presto-tests

prestocpp: uploads presto-native-build artifact, builds prestissimo image
presto: uploads presto-server/presto-cli artifacts, builds presto image

Artifacts and Images

Artifacts (1-day retention)

Artifact Contents
presto-server presto-server-*.tar.gz
presto-cli presto-cli-*-executable.jar
presto-native-build presto_server, velox_functions_remote_server_main

Docker Images (ghcr.io)

Image Description
unified-builder Build environment with all dependencies
presto Java coordinator runtime
prestissimo C++ native worker runtime

Summary by CodeRabbit

Release Notes

  • New Features

    • Added YAML-based metadata support for CLP plugin configuration
    • Introduced Pinot storage backend support for CLP splits and metadata
    • Enabled Top-N query optimization pushdown for improved performance
  • Bug Fixes

    • Updated time-zone handling for date and time literal edge cases
  • Chores

    • Updated Joda-Time dependency to 2.13.1
    • Added SnakeYAML version constraint for improved security and compatibility

✏️ Tip: You can customize this high-level summary in your review settings.

jackluo923 and others added 17 commits October 28, 2025 12:00
…ilds

Change Docker image tagging from static :dev tags to dynamic branch-based tags,
and update all workflow conditions to accept any branch with the
release-0.293-clp-connector-snapshot prefix.

Changes:
- Docker image tags now use actual branch name via type=ref,event=branch
  * Before: ghcr.io/repo/prestissimo-worker:dev
  * After: ghcr.io/repo/prestissimo-worker:release-0.293-clp-connector-snapshot-nov29
- Push conditions changed from exact match to prefix match using startsWith()
- Cancel-in-progress logic updated to exclude all snapshot prefix branches
- PR title checks now accept release-0.293-clp-connector-snapshot* pattern

Example: Pushing to branch "release-0.293-clp-connector-snapshot-nov29" will:
- Build and push images tagged with :release-0.293-clp-connector-snapshot-nov29
- Run all CI jobs to completion without cancellation
- Enable PR title validation for PRs targeting this branch

This allows multiple snapshot branches to coexist with unique image tags.
  Add alternative provider implementations for metadata and split discovery,
  enabling flexible deployment configurations without MySQL dependency.

  New providers:
  - ClpYamlMetadataProvider: Reads table/column metadata from YAML config files
    instead of querying MySQL database. Supports nested schemas and polymorphic
    types through hierarchical YAML structure.

  - ClpPinotSplitProvider: Queries Apache Pinot database for archive file paths
    via HTTP/JSON API instead of MySQL. Determines split types based on file
    extensions (.clp.zst for IR, otherwise ARCHIVE).

  Configuration:
  - Add clp.metadata-provider-type=YAML option for YAML-based metadata
  - Add clp.split-provider-type=PINOT option for Pinot-based split discovery
  - Add clp.metadata-yaml-path config for YAML metadata file location

  The YAML metadata provider parses a two-level structure:
  1. Main metadata file: maps connector → schema → table → schema_file_path
  2. Individual table schema files: define column names and types

  This provides deployment flexibility - users can now choose between:
  - Database-driven (MySQL for both metadata and splits)
  - Hybrid (YAML metadata + Pinot splits)
  - Or any combination based on their infrastructure

  Dependencies: Adds jackson-dataformat-yaml and snakeyaml 2.1
Ensure all timestamp literals are converted to nanoseconds before generating
KQL and SQL filters for pushdown, regardless of their original precision
(milliseconds, microseconds, etc.).

Problem:
Presto supports multiple timestamp precisions (TIMESTAMP with millisecond
precision, TIMESTAMP_MICROSECONDS, etc.), but CLP's query engine expects
timestamps in a consistent nanosecond format. Previously, timestamp literals
were passed through without precision normalization, causing incorrect
results when different timestamp types were used in queries.

Solution:
Add timestamp normalization logic in ClpFilterToKqlConverter:
- tryEnsureNanosecondTimestamp(): Detects TIMESTAMP and TIMESTAMP_MICROSECONDS types
- ensureNanosecondTimestamp(): Converts timestamp values to nanoseconds using:
  1. Extract epoch seconds via TimestampType.getEpochSecond()
  2. Extract nanosecond fraction via TimestampType.getNanos()
  3. Convert to total nanoseconds: SECONDS.toNanos(seconds) + nanosFraction

Applied to:
- BETWEEN operator: Both lower and upper bounds
- Comparison operators: All timestamp literal values (=, !=, <, >, <=, >=)

This ensures consistent timestamp representation in both KQL filters (sent to
CLP engine) and SQL filters (used for metadata filtering), regardless of the
timestamp precision used in the original Presto query.

Example:
Query: WHERE timestamp BETWEEN TIMESTAMP '2025-01-01' AND TIMESTAMP '2025-01-02'
Before: Literal values passed as milliseconds or microseconds
After: Both bounds normalized to nanosecond representation
   Implement a plan optimizer that pushes down LIMIT + ORDER BY operations
   to reduce split scanning when querying on timestamp metadata columns.

   The optimizer detects TopN patterns in query plans and uses archive
   metadata (timestamp bounds, message counts) to select only the minimum
   set of archives needed to guarantee correct results.

   Key features:
   - Detects and rewrites TopN → [Project] → [Filter] → TableScan patterns
   - Pushes TopN specifications through ClpTableLayoutHandle
   - Handles overlapping archives by merging into non-overlapping groups
   - Supports both ASC (earliest-first) and DESC (latest-first) ordering
   - Uses worst-case analysis to ensure correctness when archive ranges overlap

   Implementation:
   - ClpTopNSpec: Data structure for TopN specifications (limit + orderings)
   - ClpComputePushDown: Enhanced plan optimizer with TopN detection
   - ClpMySqlSplitProvider: Archive selection logic using metadata queries
   - Comprehensive test coverage for various TopN scenarios

   Performance: For queries like "SELECT * FROM logs ORDER BY timestamp DESC
   LIMIT 100", this eliminates scanning of archives that cannot contain
   the top results, significantly reducing I/O.
Implement TopN pushdown optimization in ClpPinotSplitProvider to minimize
archive scanning when executing LIMIT + ORDER BY queries on timestamp
metadata columns.

The optimization uses archive metadata (timestamp bounds and message counts)
to intelligently select only the minimum set of archives needed to guarantee
correct TopN results, significantly reducing I/O for time-range queries.

Implementation details:

Archive selection algorithm:
- Fetches archive metadata (file path, creation time, modification time, message count)
- Merges overlapping archives into non-overlapping groups by timestamp ranges
- For DESC ordering: includes newest group + older groups until limit is covered
- For ASC ordering: includes oldest group + newer groups until limit is covered
- Uses worst-case analysis to ensure correctness when archives overlap

Code structure:
- ClpPinotSplitProvider.listSplits(): Detects TopN specs and routes to optimized path
- selectTopNArchives(): Implements the archive selection algorithm
- toArchiveGroups(): Merges overlapping archives into logical components
- ArchiveMeta: Represents archive metadata with validation
- ArchiveGroup: Represents merged archive groups for overlap handling

ClpTopNSpec enhancements:
- Made fields private with proper encapsulation
- Added @JsonProperty annotations to getters for correct serialization
- Maintains immutability for thread-safety

Code quality improvements:
- Proper exception handling with context-aware error messages
- Input validation in constructors (bounds checking, null checks)
- Extracted determineSplitType() helper to eliminate duplication
- Made fields final where immutable (pinotDatabaseUrl, ArchiveMeta fields)
- Improved logging with table names and SQL queries for debugging
- Better encapsulation: private fields, getters-only access

Performance impact:
For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100",
this eliminates scanning of archives outside the time range of interest,
providing substantial I/O reduction for large archive sets.
Implements dynamic schema discovery from YAML metadata files, allowing the CLP connector
to support multiple schemas beyond just "default" (e.g., clp.dev.table, clp.prod.table).

Key changes:
- Add default listSchemaNames() method to ClpMetadataProvider interface for DRY principle
- Implement dynamic schema discovery in ClpYamlMetadataProvider with thread-safe caching
- Update ClpMetadata to delegate schema listing to metadata providers
- Add comprehensive error handling with graceful fallback to default schema
- Optimize performance with double-checked locking and ObjectMapper reuse
- Add 15+ unit tests covering all edge cases and error scenarios
- Fix checkstyle violations in UberClpPinotSplitProvider and ClpPinotSplitProvider

The implementation is backward compatible - existing single-schema setups continue to work
without configuration changes. Multi-schema support is activated automatically when the
YAML metadata file contains multiple schema entries.

Example YAML structure:
```yaml
clp:
  default:
    logs: /path/to/default/logs.yaml
  dev:
    test_logs: /path/to/dev/logs.yaml
  prod:
    production_logs: /path/to/prod/logs.yaml
```

Performance optimizations:
- Thread-safe caching prevents repeated YAML parsing
- Double-checked locking pattern for lazy initialization
- Reused ObjectMapper instances reduce object creation overhead
- Synchronized table schema map updates ensure thread safety

Testing:
- All 11 tests in TestClpYamlMetadataProvider pass
- All 4 tests in TestClpMultiSchema pass
- Fixed testListSchemaNamesNoCatalogField to handle missing catalog gracefully
…restodb#26254)

## Description
Fix flakytests that uses `experimental.spiller-spill-path` as
`/tmp/presto/spills/` in query runner

## Motivation and Context
prestodb#25890
## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
<!---Please fill in how you tested your change-->

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```
Upgrade joda-time to 2.13.1 to support new timezone data including America/Coyhaique.

Update tests to use dates where timezone gaps still exist in the updated tzdata:
- testLocallyUnrepresentableDateLiterals: 1970-01-01 -> 1932-04-01
- testLocallyUnrepresentableTimeLiterals: use 2017-04-02 or 2012-04-01

Cherry-picked from upstream: prestodb/presto@0a504d20e6
… runners

Key changes:
- Add comprehensive CI architecture documentation explaining:
  - Terminology (presto, prestocpp, prestissimo)
  - Unified builder image strategy for ephemeral runners
  - Job dependency graph
  - Comparison with upstream (prestodb/presto) CI
  - Performance benefits of pre-warmed ccache and Maven deps

- Consolidate presto-java8/java17 jobs into single matrix-based `presto` job
  - ARTIFACT_JAVA_VERSION controls which version uploads artifacts/images (default: '8')

- Add prestissimo image building to prestocpp workflow
  - Downloads artifacts from build job, packages into runtime image
  - Uses same tagging strategy as presto image (immutable + SNAPSHOT tags)

- Centralize IMAGE_VERSION_TYPE configuration (set to 'BETA')
  - Applied to both presto and prestissimo images

- Document artifacts (presto-server, presto-cli, presto-native-build)
  and Docker images (unified-builder, presto, prestissimo)
@jackluo923
Copy link
Copy Markdown
Member Author

Duplicate, updating PR #110 instead

@jackluo923 jackluo923 closed this Nov 28, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Nov 28, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This PR introduces a comprehensive CI/CD restructuring with a unified builder image strategy featuring dependency-based caching, extensive CLP plugin enhancements including YAML metadata support and Pinot split providers with TopN pushdown optimization, and test infrastructure improvements with UUID-based spill path isolation to prevent file collisions.

Changes

Cohort / File(s) Summary
GitHub Actions CI Workflow Restructuring
.github/workflows/ci.yml, .github/workflows/compute-builder-tag.yml, .github/workflows/create-builder-image.yml, .github/workflows/presto-build.yml, .github/workflows/prestocpp-linux-build-and-unit-test.yml, .github/workflows/tests.yml, .github/workflows/integration-tests.yml
New unified builder image CI pipeline with dependency-based tag computation and conditional image building; orchestrates multi-job builds for Java (presto) and C++ (prestocpp) components with shared pre-warmed caches; consolidates Maven/Node.js dependency pre-downloading.
Workflow File Removals & Updates
.github/workflows/maven-checks.yml (removed), .github/workflows/prestissimo-worker-images-build.yml (removed), .github/workflows/docs.yml, .github/workflows/pr-title-checks.yaml, .github/workflows/prestocpp-format-and-header-check.yml
Removes legacy Maven-checks and prestissimo image workflows; updates concurrency rules to use prefix-based branch exclusion for snapshot branches.
Download & Docker Build Support
.github/bin/download_nodejs, .github/dockerfiles/yscope-presto-builder.dockerfile
Adds configurable MAVEN_REPO support for cached Node.js/Yarn downloads; introduces unified builder Dockerfile with pre-warmed ccache, pre-downloaded dependencies (JDK, Node.js, Maven), and CMake setup for both presto and prestocpp.
CLP Plugin Configuration & Metadata
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java
Adds metadataYamlPath configuration; extends enums (MetadataProviderType.YAML); introduces YAML-based metadata provider with schema/table discovery from YAML files; adds default listSchemaNames method to interface.
CLP Module Binding & Split Framework
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
Introduces map-based provider bindings for metadata/split/split-filter types; adds equals/hashCode to ClpSplit; updates listSchemaNames to delegate to provider.
CLP Pinot Split Providers
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
Adds Pinot-based split providers for standard and Uber variants; implements TopN archive selection with overlap merging; Uber variant customizes Pinot endpoint and metadata table naming.
CLP Split Filter Providers
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java
Adds remapColumnName abstract method to API; updates checkContainsRequiredFilters to accept Set of push-down variables; introduces Pinot and Uber Pinot implementations; Uber variant adds TEXT_MATCH transformations for merged text index queries.
CLP TopN Pushdown Optimization
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
Adds ClpTopNSpec class modeling limit and ordering; extends ClpTableLayoutHandle with metadataQueryOnly and topN fields; implements extensive TopN visit handler in optimizer with ordering validation and metadata SQL generation.
CLP Expression & Filter Conversion
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
Adds pushDownVariables tracking to ClpExpression with new constructor overloads; propagates push-down state through all translation paths; adds nanosecond-precision timestamp normalization for KQL/metadata SQL generation.
CLP Optimization Refactoring
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
Consolidates subtree rewriting logic into visitor-based traversal; adds explicit TableScanNode handler; removes internal rewriting helpers.
CLP Test Resources
presto-clp/src/test/resources/test-*.yaml, presto-clp/src/test/resources/test-*.json
Adds YAML test schemas for default/multiple schemas, orders, users, cockroachdb; adds JSON split-filter configurations for Pinot and TopN test scenarios.
CLP Test Classes
presto-clp/src/test/java/.../TestClpTopN.java, presto-clp/src/test/java/.../TestClpMultiSchema.java, presto-clp/src/test/java/.../TestClpYamlMetadata.java, presto-clp/src/test/java/.../TestClp*.java (multiple)
Adds comprehensive test classes for TopN optimization, YAML metadata discovery, Pinot split providers (standard/Uber), split filter providers; includes schema/table/split discovery and transformation validation.
CLP Test Database Setup
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
Extends archives table schema with messageCount column; updates INSERT/MERGE logic and ArchivesTableRow constructor.
Test Infrastructure UUID Updates
presto-tests/src/test/java/com/facebook/presto/tests/TestDistributed*Spilled*.java, presto-tests/src/test/java/com/facebook/presto/tests/TestSpilled*.java, presto-thrift-connector/src/test/java/.../TestThriftDistributedQueriesIndexed.java
Appends UUID to spill paths across multiple test classes to ensure per-run directory isolation and prevent file collisions.
Timezone Test Updates
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestEngineOnlyQueries.java, presto-native-tests/src/test/java/.../TestDistributedEngineOnlyQueries.java, presto-tests/src/test/java/.../TestH2QueryRunner.java
Updates test dates for unrepresentable time scenarios (1970-01-01 → 1932-04-01, 2012-04-01 → 2017-04-02); removes java.util.Objects import.
Runtime & Dependencies
pom.xml, presto-clp/pom.xml, presto-native-execution/pom.xml, presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile, presto-native-execution/velox
Updates Joda-Time to 2.13.1; adds jackson-dataformat-yaml and SnakeYAML 2.1 upper-bound dependency management; refactors Prestissimo runtime Dockerfile to use pre-built binary and simplified runtime library extraction.
Test Configuration & Disabled Tests
presto-native-execution/src/test/java/.../TestPrestoNativeAsyncDataCacheCleanupAPI.java
Disables two test methods via enabled = false annotation.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas requiring extra attention:

  • CI Workflow Orchestration (.github/workflows/ci.yml, compute-builder-tag, create-builder-image): Complex multi-job dependency graph with cross-workflow communication, caching strategy validation, and artifact flow; verify builder image tag computation is deterministic and cache hits occur as expected.
  • CLP Pinot Integration (ClpPinotSplitProvider, ClpUberPinotSplitProvider, related tests): New HTTP-based Pinot query execution with archive metadata fetching and TopN archive selection algorithm; validate correctness of overlap merging logic and Uber-specific endpoint/table naming conventions.
  • TopN Pushdown Logic (ClpComputePushDown, ClpTableLayoutHandle, ClpTopNSpec): Complex query plan rewriting with ORDER BY validation, metadata column remapping, and conditional metadata-only query generation; ensure ordering specification correctness and integration with existing filter/project pushdowns.
  • Push-down Variable Tracking (ClpExpression, ClpFilterToKqlConverter): Extensive refactoring of expression handling to propagate push-down variables through all translation paths; verify completeness of coverage across all expression types and correct aggregation in combined expressions (AND/OR/IN).
  • YAML Metadata Provider (ClpYamlMetadataProvider): File parsing, caching, error fallback behavior with Jackson YAML factory; validate thread-safety of lazy initialization and correct relative path resolution for nested schema definitions.
  • New Provider Bindings (ClpModule): Map-based binding strategy replacing conditionals; ensure all provider types are correctly mapped and error handling for unsupported types is consistent.
  • Test Infrastructure Changes: UUID-based spill path isolation across 7+ test files; verify UUID generation doesn't introduce flakiness and spill cleanup logic handles unique directories correctly.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ci-snapshot

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41aa302 and 62303d8.

📒 Files selected for processing (70)
  • .github/bin/download_nodejs (2 hunks)
  • .github/dockerfiles/yscope-presto-builder.dockerfile (1 hunks)
  • .github/workflows/ci.yml (1 hunks)
  • .github/workflows/compute-builder-tag.yml (1 hunks)
  • .github/workflows/create-builder-image.yml (1 hunks)
  • .github/workflows/docs.yml (1 hunks)
  • .github/workflows/integration-tests.yml (1 hunks)
  • .github/workflows/maven-checks.yml (0 hunks)
  • .github/workflows/pr-title-checks.yaml (1 hunks)
  • .github/workflows/prestissimo-worker-images-build.yml (0 hunks)
  • .github/workflows/presto-build.yml (1 hunks)
  • .github/workflows/prestocpp-format-and-header-check.yml (1 hunks)
  • .github/workflows/prestocpp-linux-build-and-unit-test.yml (2 hunks)
  • .github/workflows/tests.yml (2 hunks)
  • pom.xml (2 hunks)
  • presto-clp/pom.xml (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (5 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (6 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (6 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (18 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (5 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (5 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/resources/test-cockroachdb-schema.yaml (1 hunks)
  • presto-clp/src/test/resources/test-orders-schema1.yaml (1 hunks)
  • presto-clp/src/test/resources/test-orders-schema2.yaml (1 hunks)
  • presto-clp/src/test/resources/test-pinot-split-filter.json (1 hunks)
  • presto-clp/src/test/resources/test-tables-schema.yaml (1 hunks)
  • presto-clp/src/test/resources/test-topn-split-filter.json (1 hunks)
  • presto-clp/src/test/resources/test-users-schema1.yaml (1 hunks)
  • presto-native-execution/pom.xml (1 hunks)
  • presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile (1 hunks)
  • presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java (2 hunks)
  • presto-native-execution/velox (1 hunks)
  • presto-native-tests/src/test/java/com/facebook/presto/nativetests/TestDistributedEngineOnlyQueries.java (1 hunks)
  • presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestEngineOnlyQueries.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedQueriesIndexed.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedSpilledQueries.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestH2QueryRunner.java (1 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithHighMemoryRevokingThreshold.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithLargeBlockSpillingEnabled.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithPreprocessingEnabled.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledTopNQueries.java (2 hunks)
  • presto-thrift-connector/src/test/java/com/facebook/presto/connector/thrift/integration/TestThriftDistributedQueriesIndexed.java (2 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants