Skip to content

feat: default GPU endpoints to minCudaVersion 12.8#277

Open
KAJdev wants to merge 9 commits intomainfrom
zeke/ae-2408-flash-default-gpu-endpoints-to-mincuda-128
Open

feat: default GPU endpoints to minCudaVersion 12.8#277
KAJdev wants to merge 9 commits intomainfrom
zeke/ae-2408-flash-default-gpu-endpoints-to-mincuda-128

Conversation

@KAJdev
Copy link
Contributor

@KAJdev KAJdev commented Mar 17, 2026

GPU endpoints default to minCudaVersion = "12.8" to ensure workers only run on hosts with a recent CUDA driver. The value can be overridden per-endpoint via Endpoint(min_cuda_version=...) or directly on resource classes. CPU endpoints always have minCudaVersion cleared and excluded from their API payload.

Validation

minCudaVersion is validated against the CudaVersion enum. Invalid values raise a ValueError listing the accepted versions.

Closes AE-2408

@KAJdev KAJdev force-pushed the zeke/ae-2408-flash-default-gpu-endpoints-to-mincuda-128 branch from 6b0514a to 8b625a6 Compare March 17, 2026 21:49
Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Two test gaps in the plumbing

endpoint.py and resource_provisioner.py both changed to carry minCudaVersion through, but neither has a test:

  • No test for Endpoint(min_cuda_version="12.4")_build_resource_config() → resource gets minCudaVersion="12.4"
  • No test for create_resource_from_manifest with {"minCudaVersion": "12.4"} in the manifest data → resource gets the value

The field-level tests on ServerlessResource are solid, but the plumbing from Endpoint decorator to provisioner is untested end-to-end.


Verdict

Clean, well-structured change with good test coverage on the field itself. The two plumbing tests above would close the remaining coverage gap.

Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Core change — clean

minCudaVersion added to ServerlessResource (default "12.8"), exposed as min_cuda_version on Endpoint, cleared for CPU, validated against the CudaVersion enum, plumbed through manifest → provisioner → GraphQL query, and included in _hashed_fields / _has_structural_changes. Test coverage is solid — 10 tests in TestMinCudaVersion covering defaults, overrides, validation, CPU clearing, hash and structural-change behaviour.

2. Issue: Existing GPU endpoints get silently re-provisioned on next deploy

minCudaVersion is in _hashed_fields. Existing deployed endpoints have no minCudaVersion in their stored config. After upgrading, the first flash deploy sees the new "12.8" default as a structural change and triggers re-provisioning for every GPU endpoint — even if nothing else changed. For busy production endpoints that's an unexpected rolling restart with no warning.

3. Issue: No way to opt out of the "12.8" floor

In _build_resource_config:

if self.min_cuda_version is not None:
    kwargs["minCudaVersion"] = self.min_cuda_version

None means "don't include in kwargs", which causes ServerlessResource to fall back to its "12.8" default. A user who passes Endpoint(min_cuda_version=None) expecting to remove the constraint gets "12.8" silently. If there are workloads that need to run on older drivers, they have no path to opt out.

Nit: SDK Reference shows None as the constructor default

min_cuda_version: Optional[str] = None

But the table note says "GPU endpoints default to "12.8" when not set." The effective default for GPU is "12.8" — the None in the Endpoint signature is an implementation detail. Users who see = None and try passing it explicitly to clear the constraint will be confused when it doesn't work. Consider documenting the signature as min_cuda_version: Optional[str] = None # GPU endpoints default to "12.8" or updating the table default column to show "12.8" for GPU.


Verdict: PASS WITH NITS

The implementation is correct. Items 2 and 3 are worth a quick look before merge — particularly whether the silent re-provision on upgrade is acceptable or needs a migration note in the changelog.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the serverless resource model and provisioning pipeline so GPU endpoints default to minCudaVersion = "12.8", while CPU endpoints always clear minCudaVersion and exclude it from API payloads/manifests. It also adds validation to ensure minCudaVersion is one of the supported CudaVersion enum values.

Changes:

  • Default GPU ServerlessResource.minCudaVersion to "12.8" and validate it against CudaVersion (invalid values raise ValueError with accepted versions).
  • Ensure CPU endpoint variants clear minCudaVersion and treat it as an input-only/excluded field for payload + drift/hash logic.
  • Thread minCudaVersion through Endpoint → manifest extraction → manifest provisioning → GraphQL saveEndpoint response selection, with unit tests and docs updated accordingly.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/unit/resources/test_serverless_cpu.py Adds assertions that CPU endpoints clear minCudaVersion and exclude it from hash/payload inputs.
tests/unit/resources/test_serverless.py Adds a focused test suite for minCudaVersion defaults, hashing/structural drift behavior, and validation errors.
tests/unit/resources/test_cpu_load_balancer.py Ensures CPU load balancer endpoints clear/exclude minCudaVersion consistently with other GPU-only fields.
src/runpod_flash/runtime/resource_provisioner.py Allows manifest-driven provisioning to pass through minCudaVersion when present.
src/runpod_flash/endpoint.py Adds min_cuda_version to the public Endpoint(...) API and forwards it into the resource config payload.
src/runpod_flash/core/resources/serverless_cpu.py Clears minCudaVersion for CPU endpoints and excludes it from CPU payload input-only fields.
src/runpod_flash/core/resources/serverless.py Adds minCudaVersion field default/validation, includes it in hashed/structural change detection.
src/runpod_flash/core/resources/load_balancer_sls_resource.py Excludes minCudaVersion from CPU load balancer payload input-only fields.
src/runpod_flash/core/api/runpod.py Includes minCudaVersion in the saveEndpoint GraphQL selection set.
src/runpod_flash/cli/commands/build_utils/manifest.py Extracts minCudaVersion into generated manifests when present (non-None).
docs/Resource_Config_Drift_Detection.md Documents minCudaVersion as a GPU-only field excluded from CPU hashing.
docs/GPU_Provisioning.md Documents Endpoint min_cuda_version ↔ API minCudaVersion mapping and default behavior.
docs/Flash_SDK_Reference.md Documents new Endpoint(..., min_cuda_version=...) parameter and semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@promptless
Copy link

promptless bot commented Mar 23, 2026

📝 Documentation updates detected!

New suggestion: Document min_cuda_version parameter for Flash GPU endpoints


Tip: Send Promptless a meeting transcript in Slack to generate doc updates 📝

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@deanq
Copy link
Member

deanq commented Mar 23, 2026

Should we also add 12.9 and 13.0 in the CudaVersion enum at this point?

@KAJdev KAJdev requested a review from deanq March 24, 2026 19:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 364 to +403
@@ -405,6 +400,7 @@ def __init__(
self._explicit_scaler_type = scaler_type
self.scaler_value = scaler_value
self.template = template
self.min_cuda_version = min_cuda_version
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_cuda_version defaults to CudaVersion.V12_8 for all endpoints, including CPU and pure-client (id=) cases. This conflicts with the SDK reference (docs show default None) and the stated behavior that only GPU endpoints should default to 12.8. Consider making the constructor default None and applying CudaVersion.V12_8 only when building a GPU resource config (and leaving it unset/ignored for CPU + client modes).

Copilot uses AI. Check for mistakes.
if self.image is not None:
kwargs["imageName"] = self.image

if self.min_cuda_version is not None:
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minCudaVersion is added to kwargs whenever self.min_cuda_version is non-None, even for CPU endpoints where it will be cleared/excluded later. To reduce confusion and keep manifests/payloads minimal, consider only including this field when building GPU resource configs (e.g., guard with if not is_cpu and ...).

Suggested change
if self.min_cuda_version is not None:
if not is_cpu and self.min_cuda_version is not None:

Copilot uses AI. Check for mistakes.
scaler_type: Optional[ServerlessScalerType] = None,
scaler_value: int = 4,
template: Optional[PodTemplate] = None,
min_cuda_version: Optional[str] = None,
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SDK reference lists min_cuda_version: Optional[str] = None in the Endpoint(...) signature and parameter table, but the implementation currently defaults to CUDA 12.8. Please align the documentation with the actual constructor behavior (or adjust the constructor to match the documented default).

Suggested change
min_cuda_version: Optional[str] = None,
min_cuda_version: Optional[str] = "12.8",

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants