Skip to content

ci: restructure workflows to use unified builder image on self-hosted runners#110

Merged
jackluo923 merged 3 commits into
release-0.293-clp-connector-snapshotfrom
ci-snapshot
Nov 29, 2025
Merged

ci: restructure workflows to use unified builder image on self-hosted runners#110
jackluo923 merged 3 commits into
release-0.293-clp-connector-snapshotfrom
ci-snapshot

Conversation

@jackluo923
Copy link
Copy Markdown
Member

@jackluo923 jackluo923 commented Nov 28, 2025

Summary

Restructure CI workflows to use a unified builder image on self-hosted runners, replacing branch-specific CI.

Problems with Current CI

The existing CI in release-0.293-clp-connector-snapshot has several limitations:

  1. Branch Lock-in: Only works for release-0.293-clp-connector-snapshot* branches
  2. Slow Builds: Rebuilds everything from scratch (~3+ hours end-to-end)
  3. Cache Bandwidth Bottleneck: Downloading ccache/Maven caches from remote storage on every run causes bandwidth issues with parallel builds
  4. Single-Purpose Image Tags: Images tagged only by branch name—no way to pin a specific build for production while also having a "latest" tag for development
  5. Hardcoded Dependencies: Builder image pinned to upstream Docker Hub, requires manual updates
  6. Limited Resources: GitHub-hosted runners have limited CPU/RAM

Related Work

This PR builds on ideas from two previous efforts:

y-scope/velox#45 — Migrated Velox CI to ephemeral containerized builds:

  • Dependency-hash-based image tagging (only rebuild when deps change)
  • Pre-warmed caches baked into Docker image layers
  • Ephemeral execution for reproducible builds

y-scope/presto#81 — Release workflow automation for Presto:

  • Automatic version extraction from pom.xml
  • Coordinated artifact publishing (Maven, Docker, GitHub releases)
  • Consistent version tagging across all artifacts

This PR combines these approaches: continuous image publishing on every push (not just releases) with version-TYPE-timestamp-hash tagging, plus the caching and ephemeral build strategies from the Velox work.

Solution Overview

Problem Solution
Branch lock-in Branch-agnostic CI that works on any branch
Slow builds Pre-warmed caches baked into builder image
Cache bandwidth bottleneck Caches in Docker layers—downloaded once per host, then locally cached
Single-purpose image tags Dual tagging: immutable version-TYPE-timestamp-hash + mutable SNAPSHOT; multiple streams (RELEASE, BETA, DEV)
Hardcoded builder version Auto-computed dependency hash triggers rebuilds only when deps change
Limited resources Self-hosted runners with dedicated hardware

Key Design Decisions

Caching Strategy: Bake into Docker Image

The bandwidth problem: With Apache Infra's stash service, every CI run downloads the full ccache (~2GB+) and Maven cache. When running many parallel jobs, this saturates network bandwidth and slows everything down.

The solution: Bake caches into Docker image layers:

  • Docker layer caching means each host downloads the cache layer once, then reuses it for all subsequent runs
  • The builder image is based on GitHub's runner image, which is always pre-cached on self-hosted runners—so only our added layers need downloading
  • Parallel jobs on the same host share the cached layers with zero additional network traffic

Image Tagging Strategy: Dual Tags + Version Streams

Runtime images serve multiple purposes that require different tagging strategies:

Use Case Tag Type Example
Pin specific build for production Immutable 0.293-BETA-20250522140509-484b00e
Always pull latest for dev/testing Mutable SNAPSHOT 0.293-BETA-SNAPSHOT
Choose stability level Version stream RELEASE (stable), BETA (testing), DEV (experimental)

Comparison with Other Approaches

Three CI Approaches

Aspect Upstream (prestodb/presto) release-0.293-clp-connector-snapshot This PR
Runners GitHub-hosted (ephemeral) GitHub-hosted (ubuntu-22.04) Self-hosted (ephemeral)
CI Structure Separate independent workflows Separate independent workflows Unified ci.yml orchestrator
Builder Image presto-native-dependency (C++ only) Hardcoded from upstream Docker Hub unified-builder (C++ + Maven + caches)
Builder Image Tag Pinned version-timestamp-hash N/A (uses upstream image) Auto-computed dependency hash
Runtime Image Tag Release version only (e.g., 0.292) Branch name only version-TYPE-timestamp-hash per build
ccache Strategy Stash/restore via Apache Infra Stash/restore via Apache Infra Pre-warmed in builder image
Branch Support Any branch Only release-0.293-clp-connector-snapshot* Any branch

Unified Builder vs Upstream

Aspect Upstream presto-native-dependency Our unified-builder
Base CentOS/Ubuntu minimal GitHub Actions runner image
C++ deps ✓ Installed ✓ Installed
Maven deps ✗ Not included ✓ Pre-downloaded
ccache ✗ Empty ✓ Pre-warmed (~90% hit rate)
Node.js/Yarn ✗ Not included ✓ Installed

Implementation Details

Workflow Structure

  • ci.yml: Main orchestrator calling reusable workflows
  • compute-builder-tag.yml: Computes image tag from dependency file hashes
  • create-builder-image.yml: Builds/pushes image if tag doesn't exist
  • presto-build.yml: Java build with artifact upload and Docker image push
  • prestocpp-linux-build-and-unit-test.yml: C++ build, tests, and prestissimo image
  • integration-tests.yml: E2E tests using pre-built artifacts
  • tests.yml: Java unit tests

Job Dependency Graph

compute-builder-tag ─► create-builder-image ─┬─► prestocpp ────┬─► integration-tests
                                             ├─► presto ───────┘
                                             └─► presto-tests

Builder Image Contents

The unified-builder image includes:

  • C++ dependencies from setup-ubuntu.sh scripts
  • Pre-warmed ccache from full prestocpp build (~90% hit rate)
  • Pre-downloaded Maven dependencies
  • Node.js/Yarn for frontend builds
  • Built on GitHub Actions runner base image

Results

Metric Old CI New CI Improvement
C++ Build Cold + Apache stash Pre-warmed ccache ~50% faster
Maven Build Download deps each run Pre-downloaded ~30% faster
Image Building Build from scratch (~60 min) Download artifacts (~5 min) ~90% faster

Outputs

Artifacts (1-day retention):

Artifact Contents
presto-server presto-server-*.tar.gz
presto-cli presto-cli-*-executable.jar
presto-native-build presto_server, velox_functions_remote_server_main

Docker Images (ghcr.io):

Image Description
unified-builder Build environment with all dependencies
presto Java coordinator runtime
prestissimo C++ native worker runtime

Note: Images are only pushed on push events (not PRs) for security.

Other Changes

  • Temporarily disabled TestPrestoNativeClpGeneralQueries.test due to a timestamp filtering issue in the CLP native reader. A separate PR will fix this.

Test Plan

  • CI workflows run successfully on the ci-snapshot branch
  • Builder image is created and cached correctly
  • Java unit tests pass (presto-tests with Java 8 and 17)
  • C++ unit tests pass (prestocpp)
  • Integration tests pass

Contributor checklist

  • Submission complies with contributing guide
  • PR description addresses the issue accurately and concisely
  • Documented new properties, syntax, functions, or functionality
  • Release notes follow guidelines (if required)
  • Adequate tests were added if applicable
  • CI passed

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Nov 28, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ci-snapshot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

… runners

Key changes:
- Add comprehensive CI architecture documentation explaining:
  - Terminology (presto, prestocpp, prestissimo)
  - Unified builder image strategy for ephemeral runners
  - Job dependency graph
  - Comparison with upstream (prestodb/presto) CI
  - Performance benefits of pre-warmed ccache and Maven deps

- Consolidate presto-java8/java17 jobs into single matrix-based `presto` job
  - ARTIFACT_JAVA_VERSION controls which version uploads artifacts/images (default: '8')

- Add prestissimo image building to prestocpp workflow
  - Downloads artifacts from build job, packages into runtime image
  - Uses same tagging strategy as presto image (immutable + SNAPSHOT tags)

- Centralize IMAGE_VERSION_TYPE configuration (set to 'BETA')
  - Applied to both presto and prestissimo images

- Document artifacts (presto-server, presto-cli, presto-native-build)
  and Docker images (unified-builder, presto, prestissimo)
Disable the CLP integration test due to a timestamp filtering issue in
the CLP native reader where the query returns empty results when
filtering by timestamp.

A separate PR will be raised to fix this unit test.
Restructure header comments to explain key design decisions:
- Caching strategy: why bake caches into Docker image layers
- Image tagging: dual tags (immutable + SNAPSHOT) and version streams
- Builder image tag: auto-computed dependency hash

Also update comparison table with split image tag rows (builder vs runtime).
@jackluo923 jackluo923 merged commit 387b268 into release-0.293-clp-connector-snapshot Nov 29, 2025
2 checks passed
@jackluo923 jackluo923 deleted the ci-snapshot branch November 29, 2025 01:28
@jackluo923 jackluo923 restored the ci-snapshot branch November 29, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants