Skip to content
This repository was archived by the owner on Dec 9, 2025. It is now read-only.
This repository was archived by the owner on Dec 9, 2025. It is now read-only.

Network devices operating in exclusive mode should fail when pods attempt to share same device #139

@gauravkghildiyal

Description

@gauravkghildiyal

Context

DRA permits pods to share a SINGLE ResourceClaim: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#sharing-a-single-resourceclaim

apiVersion: v1
kind: Pod
metadata:
  name: agnhost-1
  namespace: default
spec:
  containers:
  - name: agnhost
    image: registry.k8s.io/e2e-test-images/agnhost:2.39
  resourceClaims:
  - name: rdma-nic
    resourceClaimName: claim-any-rdma-nic  # Same
---
apiVersion: v1
kind: Pod
metadata:
  name: agnhost-2
  namespace: default
spec:
  containers:
  - name: agnhost
    image: registry.k8s.io/e2e-test-images/agnhost:2.39
  resourceClaims:
  - name: rdma-nic
    resourceClaimName: claim-any-rdma-nic  # Same

This is interpreted by the scheduler as the pods requesting access the "exact" same devices and the scheduler happily assigns both pods as having reservation of the claim:

apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
  name: claim-any-rdma-nic
  namespace: default
spec:
  devices:
    requests:
    - allocationMode: ExactCount
      count: 1
      deviceClassName: dranet-cloud
      name: request-any-rdma-nic
      selectors:
      - cel:
          expression: device.attributes["dra.net"].rdma == true
status:
  allocation:
    devices:
      results:
      - adminAccess: null
        device: gpu0rdma0
        driver: dra.net
        pool: my-node
        request: request-any-rdma-nic
    nodeSelector:
      nodeSelectorTerms:
      - matchFields:
        - key: metadata.name
          operator: In
          values:
          - my-node
  reservedFor:
  - name: agnhost-1
    resource: pods
  - name: agnhost-2
    resource: pods

Desired behaviour

At the moment, we only support network devices in exclusive mode (which is the correct thing to do) i.e. we move the network device to the pod namespace . So for now, we should fail the pod bootstrapping attempted by the kubelet for the second and subsequent pods.

Current behaviour

All pods using the same claim get scheduled and started correctly, but as expted ONLY ONE of them has the requested network device.

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions