This document describes how to build Docker images using GitHub Actions and deploy them to RunPod and Google Cloud Run with GPU support.
The repository includes a workflow located at .github/workflows/docker-build.yml. This workflow builds two Docker images whenever changes are pushed to main or a pull request is opened. It installs dependencies with uv and caches the build layers for faster rebuilds:
runpod:latest– image intended for RunPodcloudrun:latest– image intended for Google Cloud Run
Both images are published to the GitHub Container Registry under ghcr.io/<OWNER>/<REPO>.
The provided Dockerfile installs packages via uv pip and uses BuildKit cache mounts to speed up dependency installation.
- Ensure you have a RunPod account and have created a container workspace with GPU access.
- In RunPod, configure your deployment to pull the
runpod:latestimage from GHCR. - Expose port 8000. The server will start automatically with the default command from the Dockerfile.
- Optionally mount any model or data volumes required by the application.
- Enable the Cloud Run API and create a new service with GPU support. L4 GPUs are available in
us-central1or other supported regions. - Grant Cloud Run permission to access Artifact Registry or GHCR where the image is stored.
- Deploy using the
cloudrun:latestimage:gcloud run deploy sd-server \ --image=ghcr.io/<OWNER>/<REPO>/cloudrun:latest \ --region=us-central1 \ --gpu=1 \ --gpu-type=nvidia-l4 \ --memory=8Gi \ --min-instances=0 \ --max-instances=1 - Make sure to allocate sufficient memory and enable GPUs for the service.
To test the Docker image locally with GPU support, install the NVIDIA Container Toolkit and run:
docker run --gpus all -p 8000:8000 ghcr.io/<OWNER>/<REPO>/runpod:latestThe API will be available at http://localhost:8000.