Skip to content

feat(ci): allow for supporting multi-arch images to be built and shipped#787

Open
knechtionscoding wants to merge 1 commit intokelos-dev:mainfrom
datagravity-ai:feat/multi-arch-images-upstream
Open

feat(ci): allow for supporting multi-arch images to be built and shipped#787
knechtionscoding wants to merge 1 commit intokelos-dev:mainfrom
datagravity-ai:feat/multi-arch-images-upstream

Conversation

@knechtionscoding
Copy link
Copy Markdown
Contributor

@knechtionscoding knechtionscoding commented Mar 24, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implements native multi-architecture Docker container builds for ARM64 and AMD64 platforms, replacing QEMU emulation with native runner builds for significantly improved performance and reliability.

Key Changes:

  1. Release Workflow Redesign: Restructured .github/workflows/release.yaml to use separate jobs with native runners:

    • AMD64 builds on ubuntu-latest
    • ARM64 builds on ubuntu-24.04-arm
    • Creates multi-arch manifests combining both architectures
  2. Multi-Stage Dockerfiles: Updated all Dockerfiles to build binaries inside containers instead of copying pre-built binaries:

    • cmd/kelos-spawner/Dockerfile - converted to multi-stage build
    • cmd/ghproxy/Dockerfile - converted to multi-stage build
    • Other cmd Dockerfiles already used multi-stage builds ✓
  3. Shared Base Image: Refactored agent images to properly use the shared agent-base image:

    • codex/Dockerfile - now uses agent-base instead of duplicating Ubuntu setup
    • gemini/Dockerfile - simplified to use shared base
    • opencode/Dockerfile - simplified to use shared base
    • cursor/Dockerfile - simplified to use shared base
    • claude-code/Dockerfile already used shared base ✓
  4. Build Pipeline:

    • Parallel native builds (no QEMU emulation)
    • Proper dependency ordering: agent-baseagent-images
    • GitHub Actions caching per architecture
    • Multi-arch manifest creation for unified image tags

Performance Benefits:

  • ~10x faster ARM64 builds (native vs QEMU emulation)
  • Parallel architecture builds instead of sequential
  • Reduced image duplication through proper base image sharing

Which issue(s) this PR is related to:

N/A

Special notes for your reviewer:

  • Runner Dependency: Requires ubuntu-24.04-arm runners to be available in the GitHub organization
  • Registry: All images push to ghcr.io/kelos-dev
  • Build Order: Agent images now correctly depend on agent-base being built first

The new pipeline creates architecture-specific tags (-amd64, -arm64) then combines them into multi-arch manifests for the main tags.

Does this PR introduce a user-facing change?

feat: introduces amd and arm container images

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 9 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cursor/Dockerfile">

<violation number="1" location="cursor/Dockerfile:1">
P2: Builder toolchain is decoupled from `GO_VERSION` and uses a floating Go tag, reducing build reproducibility and risking version drift.</violation>
</file>

<file name="cmd/kelos-spawner/Dockerfile">

<violation number="1" location="cmd/kelos-spawner/Dockerfile:1">
P2: New builder base image uses a floating tag (`golang:1.25`), which can cause non-reproducible builds and external version drift.</violation>
</file>

<file name="gemini/Dockerfile">

<violation number="1" location="gemini/Dockerfile:1">
P2: Builder stage uses a floating Go image tag, making release artifacts non-reproducible across rebuilds.</violation>
</file>

<file name="claude-code/Dockerfile">

<violation number="1" location="claude-code/Dockerfile:1">
P2: New builder stage uses an unpinned `golang` image tag, making shipped binary builds non-deterministic and vulnerable to upstream image drift.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cmd/kelos-token-refresher/Dockerfile">

<violation number="1" location="cmd/kelos-token-refresher/Dockerfile:10">
P2: The Dockerfile now builds `kelos-token-refresher` twice: once via `go build` and again via `make build`, which overwrites the same output. This makes the new build step redundant and risks different build flags being used in the final binary.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 26, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found across 8 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="gemini/Dockerfile">

<violation number="1" location="gemini/Dockerfile:1">
P2: Defaulting the base image to `agent-base:latest` makes builds non-reproducible and can silently pull upstream changes; consider pinning to a specific version or digest.</violation>
</file>

<file name="opencode/Dockerfile">

<violation number="1" location="opencode/Dockerfile:1">
P2: Base image is pinned to a mutable `latest` tag, making builds non-reproducible and allowing silent upstream drift.</violation>
</file>

<file name="codex/Dockerfile">

<violation number="1" location="codex/Dockerfile:1">
P2: Base image uses mutable `latest` tag, making builds non-reproducible and allowing upstream changes without review.</violation>

<violation number="2" location="codex/Dockerfile:2">
P2: Removal of the `kelos-capture` binary copy likely breaks the entrypoint contract: `/kelos/kelos-capture` is still invoked by the entrypoint and documented as required, but this Dockerfile no longer ensures it exists in the image.</violation>
</file>

<file name="cursor/Dockerfile">

<violation number="1" location="cursor/Dockerfile:1">
P2: Base image is unpinned (`latest`), making builds non-reproducible and increasing supply-chain drift risk. Pin the base image to a specific version or digest.</violation>

<violation number="2" location="cursor/Dockerfile:2">
P2: cursor image no longer installs /kelos/kelos-capture, but the entrypoint still calls it, so the container will fail if the base image doesn’t provide that binary.</violation>
</file>

<file name=".github/workflows/release.yaml">

<violation number="1" location=".github/workflows/release.yaml:9">
P2: Per-ref concurrency allows tag release workflows to run in parallel, but the workflow still updates shared `latest` tags. Two tag releases close together can race and overwrite `latest` nondeterministically. Consider restoring a shared concurrency group for tag releases or otherwise serializing `push-latest`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

&& rm "/tmp/${TARBALL}" "/tmp/${TARBALL}.sha256"

ENV PATH="/usr/local/go/bin:${PATH}"
ARG BASE_IMAGE=agent-base:latest
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Defaulting the base image to agent-base:latest makes builds non-reproducible and can silently pull upstream changes; consider pinning to a specific version or digest.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At gemini/Dockerfile, line 1:

<comment>Defaulting the base image to `agent-base:latest` makes builds non-reproducible and can silently pull upstream changes; consider pinning to a specific version or digest.</comment>

<file context>
@@ -1,40 +1,12 @@
-    && rm "/tmp/${TARBALL}" "/tmp/${TARBALL}.sha256"
-
-ENV PATH="/usr/local/go/bin:${PATH}"
+ARG BASE_IMAGE=agent-base:latest
+FROM ${BASE_IMAGE}
 
</file context>
Fix with Cubic

&& rm "/tmp/${TARBALL}" "/tmp/${TARBALL}.sha256"

ENV PATH="/usr/local/go/bin:${PATH}"
ARG BASE_IMAGE=agent-base:latest
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Base image is pinned to a mutable latest tag, making builds non-reproducible and allowing silent upstream drift.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At opencode/Dockerfile, line 1:

<comment>Base image is pinned to a mutable `latest` tag, making builds non-reproducible and allowing silent upstream drift.</comment>

<file context>
@@ -1,40 +1,12 @@
-    && rm "/tmp/${TARBALL}" "/tmp/${TARBALL}.sha256"
-
-ENV PATH="/usr/local/go/bin:${PATH}"
+ARG BASE_IMAGE=agent-base:latest
+FROM ${BASE_IMAGE}
 
</file context>
Fix with Cubic

&& rm "/tmp/${TARBALL}" "/tmp/${TARBALL}.sha256"

ENV PATH="/usr/local/go/bin:${PATH}"
ARG BASE_IMAGE=agent-base:latest
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Base image uses mutable latest tag, making builds non-reproducible and allowing upstream changes without review.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At codex/Dockerfile, line 1:

<comment>Base image uses mutable `latest` tag, making builds non-reproducible and allowing upstream changes without review.</comment>

<file context>
@@ -1,40 +1,12 @@
-    && rm "/tmp/${TARBALL}" "/tmp/${TARBALL}.sha256"
-
-ENV PATH="/usr/local/go/bin:${PATH}"
+ARG BASE_IMAGE=agent-base:latest
+FROM ${BASE_IMAGE}
 
</file context>
Fix with Cubic


ENV PATH="/usr/local/go/bin:${PATH}"
ARG BASE_IMAGE=agent-base:latest
FROM ${BASE_IMAGE}
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Removal of the kelos-capture binary copy likely breaks the entrypoint contract: /kelos/kelos-capture is still invoked by the entrypoint and documented as required, but this Dockerfile no longer ensures it exists in the image.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At codex/Dockerfile, line 2:

<comment>Removal of the `kelos-capture` binary copy likely breaks the entrypoint contract: `/kelos/kelos-capture` is still invoked by the entrypoint and documented as required, but this Dockerfile no longer ensures it exists in the image.</comment>

<file context>
@@ -1,40 +1,12 @@
-
-ENV PATH="/usr/local/go/bin:${PATH}"
+ARG BASE_IMAGE=agent-base:latest
+FROM ${BASE_IMAGE}
 
 ARG CODEX_VERSION=0.117.0
</file context>
Fix with Cubic


ENV PATH="/usr/local/go/bin:${PATH}"
ARG BASE_IMAGE=agent-base:latest
FROM ${BASE_IMAGE}
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: cursor image no longer installs /kelos/kelos-capture, but the entrypoint still calls it, so the container will fail if the base image doesn’t provide that binary.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cursor/Dockerfile, line 2:

<comment>cursor image no longer installs /kelos/kelos-capture, but the entrypoint still calls it, so the container will fail if the base image doesn’t provide that binary.</comment>

<file context>
@@ -1,37 +1,9 @@
-
-ENV PATH="/usr/local/go/bin:${PATH}"
+ARG BASE_IMAGE=agent-base:latest
+FROM ${BASE_IMAGE}
 
 COPY cursor/kelos_entrypoint.sh /kelos_entrypoint.sh
</file context>
Fix with Cubic

&& rm "/tmp/${TARBALL}" "/tmp/${TARBALL}.sha256"

ENV PATH="/usr/local/go/bin:${PATH}"
ARG BASE_IMAGE=agent-base:latest
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Base image is unpinned (latest), making builds non-reproducible and increasing supply-chain drift risk. Pin the base image to a specific version or digest.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cursor/Dockerfile, line 1:

<comment>Base image is unpinned (`latest`), making builds non-reproducible and increasing supply-chain drift risk. Pin the base image to a specific version or digest.</comment>

<file context>
@@ -1,37 +1,9 @@
-    && rm "/tmp/${TARBALL}" "/tmp/${TARBALL}.sha256"
-
-ENV PATH="/usr/local/go/bin:${PATH}"
+ARG BASE_IMAGE=agent-base:latest
+FROM ${BASE_IMAGE}
 
</file context>
Fix with Cubic


concurrency:
group: release
group: release-${{ github.ref }}
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Per-ref concurrency allows tag release workflows to run in parallel, but the workflow still updates shared latest tags. Two tag releases close together can race and overwrite latest nondeterministically. Consider restoring a shared concurrency group for tag releases or otherwise serializing push-latest.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/release.yaml, line 9:

<comment>Per-ref concurrency allows tag release workflows to run in parallel, but the workflow still updates shared `latest` tags. Two tag releases close together can race and overwrite `latest` nondeterministically. Consider restoring a shared concurrency group for tag releases or otherwise serializing `push-latest`.</comment>

<file context>
@@ -6,23 +6,44 @@ on:
 
 concurrency:
-  group: release
+  group: release-${{ github.ref }}
   cancel-in-progress: false
 
</file context>
Suggested change
group: release-${{ github.ref }}
group: release
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants