feat: default GPU endpoints to minCudaVersion 12.8#277
Conversation
6b0514a to
8b625a6
Compare
runpod-Henrik
left a comment
There was a problem hiding this comment.
Question: Two test gaps in the plumbing
endpoint.py and resource_provisioner.py both changed to carry minCudaVersion through, but neither has a test:
- No test for
Endpoint(min_cuda_version="12.4")→_build_resource_config()→ resource getsminCudaVersion="12.4" - No test for
create_resource_from_manifestwith{"minCudaVersion": "12.4"}in the manifest data → resource gets the value
The field-level tests on ServerlessResource are solid, but the plumbing from Endpoint decorator to provisioner is untested end-to-end.
Verdict
Clean, well-structured change with good test coverage on the field itself. The two plumbing tests above would close the remaining coverage gap.
runpod-Henrik
left a comment
There was a problem hiding this comment.
1. Core change — clean
minCudaVersion added to ServerlessResource (default "12.8"), exposed as min_cuda_version on Endpoint, cleared for CPU, validated against the CudaVersion enum, plumbed through manifest → provisioner → GraphQL query, and included in _hashed_fields / _has_structural_changes. Test coverage is solid — 10 tests in TestMinCudaVersion covering defaults, overrides, validation, CPU clearing, hash and structural-change behaviour.
2. Issue: Existing GPU endpoints get silently re-provisioned on next deploy
minCudaVersion is in _hashed_fields. Existing deployed endpoints have no minCudaVersion in their stored config. After upgrading, the first flash deploy sees the new "12.8" default as a structural change and triggers re-provisioning for every GPU endpoint — even if nothing else changed. For busy production endpoints that's an unexpected rolling restart with no warning.
3. Issue: No way to opt out of the "12.8" floor
In _build_resource_config:
if self.min_cuda_version is not None:
kwargs["minCudaVersion"] = self.min_cuda_versionNone means "don't include in kwargs", which causes ServerlessResource to fall back to its "12.8" default. A user who passes Endpoint(min_cuda_version=None) expecting to remove the constraint gets "12.8" silently. If there are workloads that need to run on older drivers, they have no path to opt out.
Nit: SDK Reference shows None as the constructor default
min_cuda_version: Optional[str] = None
But the table note says "GPU endpoints default to "12.8" when not set." The effective default for GPU is "12.8" — the None in the Endpoint signature is an implementation detail. Users who see = None and try passing it explicitly to clear the constraint will be confused when it doesn't work. Consider documenting the signature as min_cuda_version: Optional[str] = None # GPU endpoints default to "12.8" or updating the table default column to show "12.8" for GPU.
Verdict: PASS WITH NITS
The implementation is correct. Items 2 and 3 are worth a quick look before merge — particularly whether the silent re-provision on upgrade is acceptable or needs a migration note in the changelog.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
There was a problem hiding this comment.
Pull request overview
This PR updates the serverless resource model and provisioning pipeline so GPU endpoints default to minCudaVersion = "12.8", while CPU endpoints always clear minCudaVersion and exclude it from API payloads/manifests. It also adds validation to ensure minCudaVersion is one of the supported CudaVersion enum values.
Changes:
- Default GPU
ServerlessResource.minCudaVersionto"12.8"and validate it againstCudaVersion(invalid values raiseValueErrorwith accepted versions). - Ensure CPU endpoint variants clear
minCudaVersionand treat it as an input-only/excluded field for payload + drift/hash logic. - Thread
minCudaVersionthrough Endpoint → manifest extraction → manifest provisioning → GraphQL saveEndpoint response selection, with unit tests and docs updated accordingly.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/resources/test_serverless_cpu.py | Adds assertions that CPU endpoints clear minCudaVersion and exclude it from hash/payload inputs. |
| tests/unit/resources/test_serverless.py | Adds a focused test suite for minCudaVersion defaults, hashing/structural drift behavior, and validation errors. |
| tests/unit/resources/test_cpu_load_balancer.py | Ensures CPU load balancer endpoints clear/exclude minCudaVersion consistently with other GPU-only fields. |
| src/runpod_flash/runtime/resource_provisioner.py | Allows manifest-driven provisioning to pass through minCudaVersion when present. |
| src/runpod_flash/endpoint.py | Adds min_cuda_version to the public Endpoint(...) API and forwards it into the resource config payload. |
| src/runpod_flash/core/resources/serverless_cpu.py | Clears minCudaVersion for CPU endpoints and excludes it from CPU payload input-only fields. |
| src/runpod_flash/core/resources/serverless.py | Adds minCudaVersion field default/validation, includes it in hashed/structural change detection. |
| src/runpod_flash/core/resources/load_balancer_sls_resource.py | Excludes minCudaVersion from CPU load balancer payload input-only fields. |
| src/runpod_flash/core/api/runpod.py | Includes minCudaVersion in the saveEndpoint GraphQL selection set. |
| src/runpod_flash/cli/commands/build_utils/manifest.py | Extracts minCudaVersion into generated manifests when present (non-None). |
| docs/Resource_Config_Drift_Detection.md | Documents minCudaVersion as a GPU-only field excluded from CPU hashing. |
| docs/GPU_Provisioning.md | Documents Endpoint min_cuda_version ↔ API minCudaVersion mapping and default behavior. |
| docs/Flash_SDK_Reference.md | Documents new Endpoint(..., min_cuda_version=...) parameter and semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
📝 Documentation updates detected! New suggestion: Document min_cuda_version parameter for Flash GPU endpoints Tip: Send Promptless a meeting transcript in Slack to generate doc updates 📝 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Should we also add 12.9 and 13.0 in the CudaVersion enum at this point? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -405,6 +400,7 @@ def __init__( | |||
| self._explicit_scaler_type = scaler_type | |||
| self.scaler_value = scaler_value | |||
| self.template = template | |||
| self.min_cuda_version = min_cuda_version | |||
There was a problem hiding this comment.
min_cuda_version defaults to CudaVersion.V12_8 for all endpoints, including CPU and pure-client (id=) cases. This conflicts with the SDK reference (docs show default None) and the stated behavior that only GPU endpoints should default to 12.8. Consider making the constructor default None and applying CudaVersion.V12_8 only when building a GPU resource config (and leaving it unset/ignored for CPU + client modes).
| if self.image is not None: | ||
| kwargs["imageName"] = self.image | ||
|
|
||
| if self.min_cuda_version is not None: |
There was a problem hiding this comment.
minCudaVersion is added to kwargs whenever self.min_cuda_version is non-None, even for CPU endpoints where it will be cleared/excluded later. To reduce confusion and keep manifests/payloads minimal, consider only including this field when building GPU resource configs (e.g., guard with if not is_cpu and ...).
| if self.min_cuda_version is not None: | |
| if not is_cpu and self.min_cuda_version is not None: |
| scaler_type: Optional[ServerlessScalerType] = None, | ||
| scaler_value: int = 4, | ||
| template: Optional[PodTemplate] = None, | ||
| min_cuda_version: Optional[str] = None, |
There was a problem hiding this comment.
The SDK reference lists min_cuda_version: Optional[str] = None in the Endpoint(...) signature and parameter table, but the implementation currently defaults to CUDA 12.8. Please align the documentation with the actual constructor behavior (or adjust the constructor to match the documented default).
| min_cuda_version: Optional[str] = None, | |
| min_cuda_version: Optional[str] = "12.8", |
GPU endpoints default to
minCudaVersion = "12.8"to ensure workers only run on hosts with a recent CUDA driver. The value can be overridden per-endpoint viaEndpoint(min_cuda_version=...)or directly on resource classes. CPU endpoints always haveminCudaVersioncleared and excluded from their API payload.Validation
minCudaVersionis validated against theCudaVersionenum. Invalid values raise aValueErrorlisting the accepted versions.Closes AE-2408