feat: default GPU endpoints to minCudaVersion 12.8 by KAJdev · Pull Request #277 · runpod/flash

KAJdev · 2026-03-17T21:36:19Z

GPU endpoints default to minCudaVersion = "12.8" to ensure workers only run on hosts with a recent CUDA driver. The value can be overridden per-endpoint via Endpoint(min_cuda_version=...) or directly on resource classes. CPU endpoints always have minCudaVersion cleared and excluded from their API payload.

Validation

minCudaVersion is validated against the CudaVersion enum. Invalid values raise a ValueError listing the accepted versions.

Closes AE-2408

…provisioner

…t detection docs

runpod-Henrik

Question: Two test gaps in the plumbing

endpoint.py and resource_provisioner.py both changed to carry minCudaVersion through, but neither has a test:

No test for Endpoint(min_cuda_version="12.4") → _build_resource_config() → resource gets minCudaVersion="12.4"
No test for create_resource_from_manifest with {"minCudaVersion": "12.4"} in the manifest data → resource gets the value

The field-level tests on ServerlessResource are solid, but the plumbing from Endpoint decorator to provisioner is untested end-to-end.

Verdict

Clean, well-structured change with good test coverage on the field itself. The two plumbing tests above would close the remaining coverage gap.

runpod-Henrik

1. Core change — clean

minCudaVersion added to ServerlessResource (default "12.8"), exposed as min_cuda_version on Endpoint, cleared for CPU, validated against the CudaVersion enum, plumbed through manifest → provisioner → GraphQL query, and included in _hashed_fields / _has_structural_changes. Test coverage is solid — 10 tests in TestMinCudaVersion covering defaults, overrides, validation, CPU clearing, hash and structural-change behaviour.

2. Issue: Existing GPU endpoints get silently re-provisioned on next deploy

minCudaVersion is in _hashed_fields. Existing deployed endpoints have no minCudaVersion in their stored config. After upgrading, the first flash deploy sees the new "12.8" default as a structural change and triggers re-provisioning for every GPU endpoint — even if nothing else changed. For busy production endpoints that's an unexpected rolling restart with no warning.

3. Issue: No way to opt out of the `"12.8"` floor

In _build_resource_config:

if self.min_cuda_version is not None:
    kwargs["minCudaVersion"] = self.min_cuda_version

None means "don't include in kwargs", which causes ServerlessResource to fall back to its "12.8" default. A user who passes Endpoint(min_cuda_version=None) expecting to remove the constraint gets "12.8" silently. If there are workloads that need to run on older drivers, they have no path to opt out.

Nit: SDK Reference shows `None` as the constructor default

min_cuda_version: Optional[str] = None

But the table note says "GPU endpoints default to "12.8" when not set." The effective default for GPU is "12.8" — the None in the Endpoint signature is an implementation detail. Users who see = None and try passing it explicitly to clear the constraint will be confused when it doesn't work. Consider documenting the signature as min_cuda_version: Optional[str] = None # GPU endpoints default to "12.8" or updating the table default column to show "12.8" for GPU.

Verdict: PASS WITH NITS

The implementation is correct. Items 2 and 3 are worth a quick look before merge — particularly whether the silent re-provision on upgrade is acceptable or needs a migration note in the changelog.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Copilot

Pull request overview

This PR updates the serverless resource model and provisioning pipeline so GPU endpoints default to minCudaVersion = "12.8", while CPU endpoints always clear minCudaVersion and exclude it from API payloads/manifests. It also adds validation to ensure minCudaVersion is one of the supported CudaVersion enum values.

Changes:

Default GPU ServerlessResource.minCudaVersion to "12.8" and validate it against CudaVersion (invalid values raise ValueError with accepted versions).
Ensure CPU endpoint variants clear minCudaVersion and treat it as an input-only/excluded field for payload + drift/hash logic.
Thread minCudaVersion through Endpoint → manifest extraction → manifest provisioning → GraphQL saveEndpoint response selection, with unit tests and docs updated accordingly.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/unit/resources/test_serverless_cpu.py	Adds assertions that CPU endpoints clear `minCudaVersion` and exclude it from hash/payload inputs.
tests/unit/resources/test_serverless.py	Adds a focused test suite for `minCudaVersion` defaults, hashing/structural drift behavior, and validation errors.
tests/unit/resources/test_cpu_load_balancer.py	Ensures CPU load balancer endpoints clear/exclude `minCudaVersion` consistently with other GPU-only fields.
src/runpod_flash/runtime/resource_provisioner.py	Allows manifest-driven provisioning to pass through `minCudaVersion` when present.
src/runpod_flash/endpoint.py	Adds `min_cuda_version` to the public `Endpoint(...)` API and forwards it into the resource config payload.
src/runpod_flash/core/resources/serverless_cpu.py	Clears `minCudaVersion` for CPU endpoints and excludes it from CPU payload input-only fields.
src/runpod_flash/core/resources/serverless.py	Adds `minCudaVersion` field default/validation, includes it in hashed/structural change detection.
src/runpod_flash/core/resources/load_balancer_sls_resource.py	Excludes `minCudaVersion` from CPU load balancer payload input-only fields.
src/runpod_flash/core/api/runpod.py	Includes `minCudaVersion` in the saveEndpoint GraphQL selection set.
src/runpod_flash/cli/commands/build_utils/manifest.py	Extracts `minCudaVersion` into generated manifests when present (non-None).
docs/Resource_Config_Drift_Detection.md	Documents `minCudaVersion` as a GPU-only field excluded from CPU hashing.
docs/GPU_Provisioning.md	Documents Endpoint `min_cuda_version` ↔ API `minCudaVersion` mapping and default behavior.
docs/Flash_SDK_Reference.md	Documents new `Endpoint(..., min_cuda_version=...)` parameter and semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

promptless · 2026-03-23T17:50:07Z

📝 Documentation updates detected!

New suggestion: Document min_cuda_version parameter for Flash GPU endpoints

Tip: Send Promptless a meeting transcript in Slack to generate doc updates 📝

…mincuda-128

src/runpod_flash/core/resources/serverless.py

src/runpod_flash/endpoint.py

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/runpod_flash/core/resources/serverless.py

deanq · 2026-03-23T21:05:59Z

Should we also add 12.9 and 13.0 in the CudaVersion enum at this point?

…mincuda-128

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T20:38:12Z

src/runpod_flash/endpoint.py

@@ -405,6 +400,7 @@ def __init__(
        self._explicit_scaler_type = scaler_type
        self.scaler_value = scaler_value
        self.template = template
+        self.min_cuda_version = min_cuda_version


min_cuda_version defaults to CudaVersion.V12_8 for all endpoints, including CPU and pure-client (id=) cases. This conflicts with the SDK reference (docs show default None) and the stated behavior that only GPU endpoints should default to 12.8. Consider making the constructor default None and applying CudaVersion.V12_8 only when building a GPU resource config (and leaving it unset/ignored for CPU + client modes).

Copilot · 2026-03-24T20:38:13Z

src/runpod_flash/endpoint.py

        if self.image is not None:
            kwargs["imageName"] = self.image

+        if self.min_cuda_version is not None:


minCudaVersion is added to kwargs whenever self.min_cuda_version is non-None, even for CPU endpoints where it will be cleared/excluded later. To reduce confusion and keep manifests/payloads minimal, consider only including this field when building GPU resource configs (e.g., guard with if not is_cpu and ...).

Suggested change

if self.min_cuda_version is not None:

if not is_cpu and self.min_cuda_version is not None:

Copilot · 2026-03-24T20:38:13Z

docs/Flash_SDK_Reference.md

    scaler_type: Optional[ServerlessScalerType] = None,
    scaler_value: int = 4,
    template: Optional[PodTemplate] = None,
+    min_cuda_version: Optional[str] = None,


The SDK reference lists min_cuda_version: Optional[str] = None in the Endpoint(...) signature and parameter table, but the implementation currently defaults to CUDA 12.8. Please align the documentation with the actual constructor behavior (or adjust the constructor to match the documented default).

Suggested change

min_cuda_version: Optional[str] = None,

min_cuda_version: Optional[str] = "12.8",

KAJdev added 4 commits March 17, 2026 14:41

feat: default GPU endpoints to minCudaVersion 12.8

09dc76b

feat: plumb minCudaVersion through Endpoint decorator, manifest, and …

0b99052

…provisioner

docs: add minCudaVersion to SDK reference, GPU provisioning, and drif…

682eda9

…t detection docs

fix: validate minCudaVersion against known CudaVersion values

8b625a6

KAJdev force-pushed the zeke/ae-2408-flash-default-gpu-endpoints-to-mincuda-128 branch from 6b0514a to 8b625a6 Compare March 17, 2026 21:49

runpod-Henrik reviewed Mar 18, 2026

View reviewed changes

deanq requested review from Copilot and jhcipar March 20, 2026 22:05

Copilot started reviewing on behalf of deanq March 20, 2026 22:06 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Merge branch 'main' into zeke/ae-2408-flash-default-gpu-endpoints-to-…

4a84bbe

…mincuda-128

jhcipar approved these changes Mar 23, 2026

View reviewed changes

deanq requested changes Mar 23, 2026

View reviewed changes

src/runpod_flash/core/resources/serverless.py Outdated Show resolved Hide resolved

src/runpod_flash/endpoint.py Outdated Show resolved Hide resolved

deanq requested a review from Copilot March 23, 2026 20:59

Copilot started reviewing on behalf of deanq March 23, 2026 21:00 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

src/runpod_flash/core/resources/serverless.py Outdated Show resolved Hide resolved

src/runpod_flash/core/resources/serverless.py Show resolved Hide resolved

promptless bot mentioned this pull request Mar 24, 2026

docs: Document min_cuda_version parameter for Flash GPU endpoints runpod/docs#593

Draft

KAJdev added 3 commits March 24, 2026 11:15

fix: accept CudaVersion enums

cc7b28a

fix: test

29e2c8e

Merge branch 'main' into zeke/ae-2408-flash-default-gpu-endpoints-to-…

f64aaf9

…mincuda-128

KAJdev requested a review from deanq March 24, 2026 19:20

fix: serialize CudaVersion enum to string in manifest

6242d96

deanq approved these changes Mar 24, 2026

View reviewed changes

deanq requested review from Copilot and runpod-Henrik March 24, 2026 20:34

Copilot started reviewing on behalf of deanq March 24, 2026 20:35 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: default GPU endpoints to minCudaVersion 12.8#277

feat: default GPU endpoints to minCudaVersion 12.8#277
KAJdev wants to merge 9 commits intomainfrom
zeke/ae-2408-flash-default-gpu-endpoints-to-mincuda-128

KAJdev commented Mar 17, 2026

Uh oh!

runpod-Henrik left a comment

Uh oh!

runpod-Henrik left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

promptless bot commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

deanq commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	if self.min_cuda_version is not None:
	if not is_cpu and self.min_cuda_version is not None:

	min_cuda_version: Optional[str] = None,
	min_cuda_version: Optional[str] = "12.8",

Conversation

KAJdev commented Mar 17, 2026

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

Question: Two test gaps in the plumbing

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. Core change — clean

2. Issue: Existing GPU endpoints get silently re-provisioned on next deploy

3. Issue: No way to opt out of the "12.8" floor

Nit: SDK Reference shows None as the constructor default

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

promptless bot commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

deanq commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

3. Issue: No way to opt out of the `"12.8"` floor

Nit: SDK Reference shows `None` as the constructor default