Skip to content
This repository was archived by the owner on May 19, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 11 additions & 143 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,81 +14,21 @@ env:
IMAGE: riscv-runner
PROD_URL: https://riseriscvrunnerappqdvknz9s-ghfe.functions.fnc.fr-par.scw.cloud
STAGING_URL: https://riseriscvrunnerappst73ndwr0w-ghfe.functions.fnc.fr-par.scw.cloud
GO_GHFE_URL: https://riseriscvrunnerappst73ndwr0w-ghfe-go.functions.fnc.fr-par.scw.cloud

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0 # diff-cover needs the base commit
- name: Setup Python 3.12
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: "3.12"
- name: Run tests
run: |
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r container/requirements.txt
pip install -r requirements-dev.txt
PYTHONPATH=${{ github.workspace }}/container pytest

- name: Add diff coverage to step summary
if: github.event_name == 'pull_request' || github.event_name == 'push'
run: |
if [[ "${{ github.event_name }}" = "pull_request" ]]; then
BASE_SHA="${{ github.event.pull_request.base.sha }}"
BASE_REF_NAME="refs/heads/${{ github.event.pull_request.base.ref }}"
BASE_REF_URL="${{ github.event.pull_request.base.repo.html_url }}/tree/${BASE_SHA}"
elif [[ "${{ github.event_name }}" = "push" ]]; then
if [[ "${{ github.ref_name == github.event.repository.default_branch }}" = "true" ]]; then
# If we are on default branch
if [[ "${{ github.event.forced }}" = "true" ]]; then
# If we are force-pushing, we don't know what's the previous commit to compare to
echo "::error::.github/workflows/release.yml Branch ${{ github.ref_name}} was just force-pushed, can't measure diff-coverage"
exit 0 # do not fail the workflow nonetheless
fi
BASE_SHA="${{ github.event.before }}"
BASE_REF_NAME="${{ github.ref }}"
else
# If we are not on default branch, compare to default branch
git fetch origin ${{ github.event.repository.default_branch }}
BASE_SHA="$(git rev-parse origin/${{ github.event.repository.default_branch }})"
BASE_REF_NAME="${{ github.event.repository.default_branch }}"
fi
BASE_REF_URL="${{ github.event.repository.html_url }}/tree/${BASE_SHA}"
fi
if [[ -n "${BASE_SHA}" ]]; then
source .venv/bin/activate
diff-cover coverage.xml \
--compare-branch "${BASE_SHA}" \
--markdown-report diff-cover.md \
--fail-under 80
{
echo ""
echo "**Base ref: [${BASE_REF_NAME}](${BASE_REF_URL})**"
echo ""
cat diff-cover.md
} >> "$GITHUB_STEP_SUMMARY"
fi

test-go:
runs-on: ubuntu-latest
defaults:
run:
working-directory: container-go
working-directory: container
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v6.0.0
with:
go-version-file: container-go/go.mod
go-version-file: container/go.mod
Comment thread
luhenry marked this conversation as resolved.
- name: go vet
run: go vet ./...
- name: gofmt check
Expand All @@ -105,9 +45,11 @@ jobs:
build:
needs: [test]
runs-on: ubuntu-latest
Comment thread
luhenry marked this conversation as resolved.
defaults:
run:
working-directory: container
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: rlespinasse/github-slug-action@9e7def61550737ba68c62d34a32dd31792e3f429 # v5.5.0

- name: Setup Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
Expand All @@ -125,7 +67,6 @@ jobs:
uses: docker/metadata-action@030e881283bb7a6894de51c315a6bfe6a94e05cf # v6.0.0

- name: Build ghfe image
id: ghfe
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
with:
platforms: linux/amd64
Expand All @@ -137,11 +78,10 @@ jobs:
cache-from: |
type=gha,scope=docker
cache-to: |
${{ github.ref_name == github.event.repository.default_branch && 'type=gha,scope=type=gha,scope=docker' || '' }}
${{ github.ref_name == github.event.repository.default_branch && 'type=gha,scope=docker' || '' }}
push: ${{ github.repository_owner == 'riseproject-dev' }}

- name: Build scheduler image
id: scheduler
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
with:
platforms: linux/amd64
Expand All @@ -153,63 +93,13 @@ jobs:
cache-from: |
type=gha,scope=docker
cache-to: |
${{ github.ref_name == github.event.repository.default_branch && 'type=gha,scope=type=gha,scope=docker' || '' }}
push: ${{ github.repository_owner == 'riseproject-dev' }}

build-go:
needs: [test-go]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Setup Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0

- name: Login to Container Registry
if: github.repository_owner == 'riseproject-dev'
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
with:
registry: ${{ env.REGISTRY }}
username: nologin
password: ${{ secrets.SCW_SECRET_KEY }}

- name: Extract metadata for Docker
id: meta
uses: docker/metadata-action@030e881283bb7a6894de51c315a6bfe6a94e05cf # v6.0.0

- name: Build ghfe-go image
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
with:
platforms: linux/amd64
context: container-go
file: container-go/Dockerfile
target: ghfe
tags: ${{ env.REGISTRY }}/${{ env.IMAGE }}:ghfe-sha-${{ github.sha }}-go
labels: ${{ steps.meta.outputs.labels }}
cache-from: |
type=gha,scope=docker-go
cache-to: |
${{ github.ref_name == github.event.repository.default_branch && 'type=gha,scope=docker-go' || '' }}
push: ${{ github.repository_owner == 'riseproject-dev' }}

- name: Build scheduler-go image
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
with:
platforms: linux/amd64
context: container-go
file: container-go/Dockerfile
target: scheduler
tags: ${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-sha-${{ github.sha }}-go
labels: ${{ steps.meta.outputs.labels }}
cache-from: |
type=gha,scope=docker-go
cache-to: |
${{ github.ref_name == github.event.repository.default_branch && 'type=gha,scope=docker-go' || '' }}
${{ github.ref_name == github.event.repository.default_branch && 'type=gha,scope=docker' || '' }}
push: ${{ github.repository_owner == 'riseproject-dev' }}

deploy-staging:
if: github.repository_owner == 'riseproject-dev' && github.ref_name == github.event.repository.default_branch
name: "deploy to staging"
needs: [build, build-go]
needs: [build]
runs-on: ubuntu-latest
environment: staging
concurrency:
Expand Down Expand Up @@ -240,19 +130,8 @@ jobs:
-t ${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-staging \
${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-sha-${{ github.sha }}

- name: Tag ghfe-go image for staging
run: >-
docker buildx imagetools create \
-t ${{ env.REGISTRY }}/${{ env.IMAGE }}:ghfe-staging-go \
${{ env.REGISTRY }}/${{ env.IMAGE }}:ghfe-sha-${{ github.sha }}-go

- name: Tag scheduler-go image for staging
run: >-
docker buildx imagetools create \
-t ${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-staging-go \
${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-sha-${{ github.sha }}-go

- name: Deploy to Scaleway
working-directory: container
run: npm ci && npx serverless deploy --stage=staging
env:
SCW_SECRET_KEY: ${{ secrets.SCW_SECRET_KEY }}
Expand All @@ -279,7 +158,7 @@ jobs:
deploy-prod:
if: github.repository_owner == 'riseproject-dev' && github.ref_name == github.event.repository.default_branch
name: "deploy to prod"
needs: [build, build-go, deploy-staging]
needs: [build, deploy-staging]
runs-on: ubuntu-latest
environment: prod
concurrency:
Expand Down Expand Up @@ -310,19 +189,8 @@ jobs:
-t ${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-prod \
${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-sha-${{ github.sha }}

- name: Tag ghfe-go image for prod
run: >-
docker buildx imagetools create \
-t ${{ env.REGISTRY }}/${{ env.IMAGE }}:ghfe-prod-go \
${{ env.REGISTRY }}/${{ env.IMAGE }}:ghfe-sha-${{ github.sha }}-go

- name: Tag scheduler-go image for prod
run: >-
docker buildx imagetools create \
-t ${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-prod-go \
${{ env.REGISTRY }}/${{ env.IMAGE }}:scheduler-sha-${{ github.sha }}-go

- name: Deploy to Scaleway
working-directory: container
run: npm ci && npx serverless@3 deploy --stage=main
env:
SCW_SECRET_KEY: ${{ secrets.SCW_SECRET_KEY }}
Expand Down
88 changes: 37 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,12 @@ The system is split into two containers:
GitHub (workflow_job webhook)
|
v
ghfe (ghfe.py)
| - Proxies webhooks to staging for staging entities (prod only)
ghfe (container/cmd/ghfe)
| - Verifies webhook signature
| - Validates labels, determines entity type (org or personal)
| - Resolves (entity_id, job_labels) -> (k8s_pool, k8s_image)
| - Writes job to PostgreSQL
| - Serves /usage, /history
| - Serves /setup/{org,personal}, /trace/*
| - NO GitHub API calls, NO k8s calls
|
v
Expand All @@ -69,7 +68,7 @@ PostgreSQL (state store)
| - LISTEN/NOTIFY: wakes scheduler on new jobs
|
v
Scheduler (scheduler.py)
Scheduler (container/cmd/scheduler)
| - sync_jobs_state: sync job status with GitHub
| - sync_workers_state: runs under a per-scheduler LOCK TABLE workers advisory,
| 5 phases (atomic, single transaction):
Expand Down Expand Up @@ -219,9 +218,10 @@ chronological order:

Each row carries the full payload as `JSONB`, plus filter/index keys
(`installation_id`, `app_id`, `entity_type`, `entity_id`, `entity_name`)
and a free-form `outcome` string. The `WebhookOutcome` enum in
`constants.py` is the canonical list of outcome values; the column itself
is `TEXT` so new outcomes don't require schema migrations. `entity_id`
and a free-form `outcome` string. The `WebhookOutcome` type in
`container/internal/contract.go` is the canonical list of outcome values;
the column itself is `TEXT` so new outcomes don't require schema
migrations. `entity_id`
is the GitHub `account.id`, which is stable across renames and reinstalls
— uninstalling and reinstalling the app produces a new `installation_id`
but keeps the same `entity_id`.
Expand All @@ -236,10 +236,11 @@ has no UNIQUE constraint on payload so a duplicate log row is acceptable
(the trace endpoints can dedupe by `delivery_id` from the JSONB payload
when needed).

The scheduler's `_gh_authenticate_app` wrapper logs only failures
(`gh.authenticate_app` is `@ttl_cache`-decorated, so success is the hot
path). `cachetools.func.ttl_cache` does not cache exceptions, so transient
errors don't poison subsequent calls.
The scheduler's `ghAuthenticate` wrapper
(`container/cmd/scheduler/gh_auth.go`) only records failures: the
underlying `AuthenticateApp` is TTL-cached, so success is the hot path
and would drown the log. Failures are not cached, so transient errors
don't poison subsequent calls.

#### State reconstruction

Expand Down Expand Up @@ -391,12 +392,13 @@ The scheduler iterates pending jobs in FIFO order. For each job:

### Configuration

Per-entity configuration is defined in `ENTITY_CONFIG` in `constants.py`, keyed by entity ID (org ID or user ID):
Per-entity configuration is defined in `EntityConfigs` in
`container/internal/constants.go`, keyed by entity ID (org ID or user ID):

| Field | Type | Description |
|-------|------|-------------|
| `max_workers` | int or None | Maximum concurrent workers across all pools. None = unlimited |
| `staging` | bool | If true, webhooks are proxied from prod to staging |
| `MaxWorkers` | `*int` | Maximum concurrent workers across all pools. `nil` = unlimited |
| `Staging` | `[]string` | Repository names whose webhooks should be proxied from prod to staging |

### HTTP routes

Expand All @@ -406,8 +408,8 @@ Per-entity configuration is defined in `ENTITY_CONFIG` in `constants.py`, keyed
|-------|--------|-------------|
| `/` | POST | Webhook endpoint for `workflow_job` events |
| `/health` | GET | Health check (returns `ok`) |
| `/usage` | GET | Human-readable view of per-pool jobs and workers |
| `/history` | GET | Job history sorted by status (pending, running, completed) then creation time |
| `/setup/org` | GET | GitHub App post-install landing page for organization installations |
| `/setup/personal` | GET | GitHub App post-install landing page for personal-account installations |
| `/trace/entity/<entity_id>` | GET | Installation event log for an entity (requires bearer token) |
| `/trace/installation/<installation_id>` | GET | Resolves to `entity_id` then returns its event log |
| `/trace/job/<job_id>` | GET | Resolves to `entity_id` via `jobs.entity_id` then returns its event log |
Expand All @@ -418,38 +420,26 @@ Per-entity configuration is defined in `ENTITY_CONFIG` in `constants.py`, keyed
| Route | Method | Description |
|-------|--------|-------------|
| `/health` | GET | Health check (returns `ok`) |
| `/usage` | GET | Human-readable view of per-pool jobs and workers (`/usage.json` for JSON) |
| `/history`, `/jobs` | GET | Job history sorted by status then creation time (`.json` variants for JSON) |
| `/workers` | GET | Worker history with `failure_info` for failed workers (`.json` variant for JSON) |

### Key files

| File | Purpose |
|------|---------|
| `container/constants.py` | Environment configuration, entity config, image tags |
| `container/ghfe.py` | Flask webhook handler -- validates requests, writes to PostgreSQL |
| `container/scheduler.py` | Scheduler -- GH reconciliation, demand matching, cleanup, worker status sync |
| `container/k8s.py` | Kubernetes pod provisioning, deletion, capacity checks, failure info collection |
| `container/db.py` | PostgreSQL database operations |
| `container/github.py` | GitHub API functions (auth, runner groups, JIT config, job status) |
| `container/Dockerfile` | Docker image for the ghfe and scheduler containers |
| `container-go/` | Go reimplementation of ghfe and scheduler (see `container-go/CONTRACT.md`) |
| `container/cmd/ghfe/` | Webhook handler — validates requests, writes to PostgreSQL, serves `/setup/*` and `/trace/*` |
| `container/cmd/scheduler/` | Scheduler — GH reconciliation, demand matching, cleanup, worker status sync; serves `/usage`, `/history`, `/jobs`, `/workers` |
| `container/internal/constants.go` | Environment configuration, `EntityConfigs`, timeouts, image tags |
| `container/internal/contract.go` | Shared types, `WebhookOutcome` enum, DB/GitHub/Kube interfaces |
| `container/internal/db.go` | PostgreSQL operations (pgx) |
| `container/internal/github.go` | GitHub App auth + REST client |
| `container/internal/k8s.go` | Kubernetes pod provisioning, deletion, capacity checks, failure-info collection |
| `container/internal/testutil/` | In-memory fakes shared by `cmd/` tests |
| `container/Dockerfile` | Multi-stage build producing the `ghfe` and `scheduler` images |
| `container/serverless.yml` | Scaleway Serverless deployment manifest |
| `scripts/trace_installation.py` | CLI client for the `/trace/*` endpoints — chronological table + diagnosis hints |

### Go cutover

`container-go/` ships a Go reimplementation deployed alongside the Python tree
as the `ghfe-go` and `scheduler-go` Scaleway functions. Cutover is gradual:

- **ghfe**: set `GO_GHFE_URL` on the Python ghfe function to the Go ghfe URL,
then populate `GO_GHFE_ROUTING={"entities":[<entity_id>, ...]}` with the
GitHub owner ids to forward. Only `workflow_job` webhooks are routed; the
staging proxy at `container/ghfe.py:509` still runs first. Rollback is
removing entries from `GO_GHFE_ROUTING`.
- **scheduler**: single deployment. Once staging has soaked on the Go
scheduler, swap the prod `scheduler` function's image to
`scheduler-prod-go`. Rollback is the inverse image swap.

See `container-go/CONTRACT.md` for the frozen behavioral surface the Go port
must preserve.

### Infrastructure

| Service | Product | Purpose |
Expand All @@ -465,20 +455,16 @@ Production and staging each have their own k8s cluster, provisioned via the `scr

## Development

Create a python venv and install dev dependencies:
```bash
python3.12 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements-dev.txt
```
The containers are pure Go. From `container/`:

Run tests:
```bash
source .venv/bin/activate && PYTHONPATH=container python3 -m pytest
go vet ./...
gofmt -l . # exits 0 with no output if everything is formatted
go test -race ./...
```

Tests mock PostgreSQL and Kubernetes -- no live services are required.
Tests run against in-memory fakes for PostgreSQL, the GitHub API, and the
Kubernetes API — no live services are required.

## Deployment

Expand Down
Loading
Loading