This document explains the split build approach implemented for the build-hysds_base job as a proof-of-concept. This approach builds x86_64 and ARM64 architectures natively on separate machines in parallel, then combines them into a multi-platform manifest.
build-hysds_base:
- Builds both linux/amd64 and linux/arm64 simultaneously
- ARM64 runs via QEMU emulation (10-50x slower)
- Uses large resource class (4 vCPUs, 8GB RAM)
- Build time: ~30 minutes
- Single executor doing sequential emulated builds
build-hysds_base-amd64:
- Builds only linux/amd64 natively
- Uses default medium resource (2 vCPUs, 4GB RAM)
- Runs on x86_64 Docker executor (docker:24.0.9-git)
- Build time: ~10 minutes
- Native x86_64 execution
build-hysds_base-arm64:
- Builds only linux/arm64 natively
- Uses arm.medium resource (2 vCPUs, 4GB RAM)
- Runs on ARM64 Docker executor (docker:24.0.9-git)
- Build time: ~12 minutes
- Native ARM64 execution (no emulation)
build-hysds_base-manifest:
- Creates multi-platform manifest
- Combines amd64 and arm64 images using docker manifest
- Uses default medium resource
- Build time: <1 minute
- No actual building, just manifest operations
Total time: ~12-13 minutes (amd64 and arm64 run in parallel)
Speedup: 60% faster (13 min vs 30 min)
- 60% faster builds - 12-13 min vs 30 min
- Native performance - No emulation overhead
- Lower resource usage - Uses default medium instead of large
- True parallelism - Both architectures build simultaneously on separate machines
- Cost effective - Free plan supports arm.medium
Architecture-specific jobs use standard docker build commands, not docker buildx build. This ensures:
- Pure single-platform images are created
- No manifest lists are generated prematurely
docker manifest createcan properly combine the images- Native architecture of the executor determines the platform
We avoid buildx in architecture-specific jobs because:
- Buildx can create manifest lists even for single-platform builds
docker manifest createcannot combine existing manifest lists- Standard
docker buildon native hardware is simpler and more reliable - The executor's native architecture automatically determines the platform
┌─────────────────────────┐
│ build-hysds_base-amd64 │ (x86_64 native)
└───────────┬─────────────┘
│
├──────────────────┐
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ build-hysds_base-arm64 │ │ build-hysds_base-manifest│
└─────────────────────────┘ └─────────────────────────┘
(ARM64 native) (combines both)
Each architecture build creates separate tags:
hysds/base:HC-567-amd64hysds/base:HC-567-arm64
The manifest job combines them into:
hysds/base:HC-567(multi-platform manifest)
When users pull hysds/base:HC-567, Docker automatically selects the correct architecture.
All major images have been split into 3 jobs each (amd64, arm64, manifest):
build-hysds_base✅build-hysds_dev✅build-hysds_cuda_base✅build-hysds_cuda_dev✅
build-hysds_verdi_pge_base✅- Note: This job builds TWO images (pge-base and verdi), both are handled in the manifest job
build-hysds_cuda_pge_base✅
build-hysds_mozart✅build-hysds_metrics✅build-hysds_grq✅build-hysds_cont_int✅
export-support-assets-amd64/export-support-assets-arm64/deploy-support-assets✅- Exports registry, logstash, and other support images for both architectures
build-deploy-develop workflow:
- ✅ All base images (base, dev, cuda-base, cuda-dev) - Active
- ✅ PGE/Verdi images (verdi, pge-base, cuda-pge-base) - Active
⚠️ Component images (mozart, metrics, grq, cont_int) - Job definitions exist but currently commented out in workflow
build-deploy-release workflow:
- ✅ All images configured with split builds for release tags (v6.*)
Component images (mozart, metrics, grq, cont_int) are fully implemented with split build jobs but are commented out in the build-deploy-develop workflow. To enable them, uncomment the corresponding sections in the workflow configuration.
For each image (e.g., hysds_base), watch for 3 jobs to appear:
build-hysds_base-amd64- Builds on x86_64 executorbuild-hysds_base-arm64- Builds on ARM64 executor (arm.mediumresource class)build-hysds_base-manifest- Creates multi-platform manifest
- Parallel execution: amd64 and arm64 jobs run simultaneously
- Sequential manifest: manifest job waits for both architecture builds to complete
- Dependency chain: Downstream images wait for manifest completion
- amd64: ~10 minutes
- arm64: ~12 minutes (runs in parallel with amd64)
- manifest: <1 minute
- Total: ~13 minutes (vs 30 minutes with emulated buildx)
After builds complete, verify the multi-platform manifest:
docker manifest inspect hysds/base:developYou should see entries for both linux/amd64 and linux/arm64 platforms.
Branch HC-567 contains the complete split build implementation:
- ✅ All job definitions created for split builds
- ✅ Core images (base, dev, cuda-base, cuda-dev) active in workflows
- ✅ PGE/Verdi images active in workflows
- ✅ Support assets exported for both architectures
- ✅ Release workflow configured for all images
⚠️ Component images (mozart, metrics, grq, cont_int) commented out in develop workflow- Jobs are defined and ready
- Need to be uncommented and tested
- Test current active builds - Verify base, dev, cuda, and verdi images build successfully
- Enable component images - Uncomment mozart, metrics, grq, cont_int in workflow
- Monitor build performance - Track build times and resource usage
- Merge to develop - Once all images are verified working
- AMD64 jobs: Default
medium(2 vCPUs, 4GB RAM) - ARM64 jobs:
arm.medium(2 vCPUs, 4GB RAM) - Manifest jobs: Default
medium(minimal resources)
Architecture-specific tags:
hysds/base:develop-amd64hysds/base:develop-arm64hysds/base:v6.0.0-amd64(for releases)hysds/base:v6.0.0-arm64(for releases)
Multi-platform manifests:
hysds/base:develop(points to both architectures)hysds/base:latest(for releases)hysds/base:v6.0.0(for releases)
The build-hysds_verdi_pge_base-manifest job exports both architectures:
hysds-verdi-develop.tar.gz(amd64, no suffix for backwards compatibility)hysds-verdi-develop-arm64.tar.gz(arm64, with suffix)
If issues arise, revert to the previous single-job approach by:
- Restoring the original single-job definitions (with buildx emulation)
- Updating workflow to use single jobs instead of split jobs
- Re-enabling
resource_class: largefor emulated builds - Removing architecture-specific tag suffixes