Skip to content

feat(tasks): add cluster:gpu task and wire e2e:gpu to use it#547

Open
elezar wants to merge 1 commit intomainfrom
feat/gpu-e2e-support
Open

feat(tasks): add cluster:gpu task and wire e2e:gpu to use it#547
elezar wants to merge 1 commit intomainfrom
feat/gpu-e2e-support

Conversation

@elezar
Copy link
Member

@elezar elezar commented Mar 23, 2026

Summary

Adds a cluster:gpu mise task that bootstraps the cluster with NVIDIA GPU passthrough enabled (--gpu flag), and updates e2e:python:gpu to depend on it instead of the plain cluster task. Also updates gpu_sandbox_spec in the e2e conftest to default to an empty image string, deferring image resolution to the server.

Related Issue

Changes

  • tasks/cluster.toml: add cluster:gpu task with CLUSTER_GPU=1 env var
  • tasks/scripts/cluster-bootstrap.sh: pass --gpu to openshell gateway start when CLUSTER_GPU=1
  • tasks/test.toml: wire e2e:python:gpu to depend on cluster:gpu instead of cluster
  • e2e/python/conftest.py: default GPU sandbox image to "" so the server resolves the configured default; allow override via OPENSHELL_E2E_GPU_IMAGE

Testing

  • mise run pre-commit passes
  • Unit tests added/updated (not applicable)
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@elezar elezar self-assigned this Mar 23, 2026
@elezar elezar marked this pull request as ready for review March 23, 2026 15:09
@elezar elezar requested a review from a team as a code owner March 23, 2026 15:09
Copy link
Collaborator

@pimlock pimlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is going to be very handy for GPU cluster dev.

Could you please replace the separate cluster:gpu task with an env var?

["e2e:python:gpu"]
description = "Run Python GPU e2e tests"
depends = ["python:proto", "cluster"]
depends = ["python:proto", "cluster:gpu"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, we prefer not adding new tasks, unless really necessary. In this case, we could drop the extra task and add env var in here directly (this is supported by mise).

Suggested change
depends = ["python:proto", "cluster:gpu"]
depends = ["python:proto", "CLUSTER_GPU=1 cluster"]

Comment on lines +10 to +14
["cluster:gpu"]
description = "Bootstrap or incremental deploy with NVIDIA GPU passthrough enabled"
env = { CLUSTER_GPU = "1" }
run = "tasks/scripts/cluster.sh"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this as a separate task and use the env var instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants