Skip to content

[CP 1267] DCM: mount default ConfigMap when spec.configManager.config is omitted and it's E2Es#508

Open
ci-penbot-01 wants to merge 1 commit intoROCm:mainfrom
ci-penbot-01:CP.O2O.pensando.gpu-operator.1267.rocm.gpu-operator.main
Open

[CP 1267] DCM: mount default ConfigMap when spec.configManager.config is omitted and it's E2Es#508
ci-penbot-01 wants to merge 1 commit intoROCm:mainfrom
ci-penbot-01:CP.O2O.pensando.gpu-operator.1267.rocm.gpu-operator.main

Conversation

@ci-penbot-01
Copy link
Copy Markdown
Contributor

cp of pensando/gpu-operator#1267


Source PR Description (pensando/gpu-operator#1267):

Summary

When Device Config Manager is enabled on a DeviceConfig:

  • If spec.configManager.config is unset or the referenced name is empty, the DCM DaemonSet mounts ConfigMap default-dcm-config in the DeviceConfig namespace and exposes it at /etc/config-manager/ (typically config.json).
  • If spec.configManager.config.name is set, the DaemonSet mounts that ConfigMap instead; the operator does not create or manage that object—the cluster admin must supply it.

New behavior (default ConfigMap lifecycle)

  • Operator: Before reconciling the DCM DaemonSet, the operator ensures default-dcm-config exists when the default mount is in use. If the ConfigMap is missing, it is created with a built-in default config.json (embedded in the operator binary from internal/configmanager/default_dcm_config.json). If it already exists (e.g. from Helm or a prior run), it is not overwritten.
  • Helm: The chart can install the same default-dcm-config in the release namespace by default (defaultDCMConfigMap in values.yaml, new template). Operators can turn this off or override data for site-specific profiles. Chart config.json content should stay aligned with the embedded JSON used by the operator.

API / docs

  • ConfigManagerSpec.Config doc updated to describe default vs explicit ConfigMap behavior.
  • Published-oriented updates in docs/dcm/device-config-manager-configmap.md (behavior without internal function/file callouts where appropriate).

E2E

  • ensureDefaultDCMConfigMap: if default-dcm-config already has non-empty config.json (e.g. Helm), tests do not delete/recreate it; otherwise the test fixture is applied.
  • verifyDCMConfigMapVolumeRef: asserts the DCM DaemonSet uses the expected ConfigMap on the config-manager volume.
  • TestDCMDefaultConfigMapWhenConfigOmitted: SIM-friendly check that DCM comes up with no spec.configManager.config and the default ConfigMap name is mounted.
  • SIM vs GPU: skipDCMTestIfSIMRequiresGPU and related skips adjusted so partition / GPU-only tests skip under SIM while K8s-only checks can run.
  • dev.env: E2E_DCM_IMAGE bumped to device-config-manager:v1.4.1 for tests that pull DCM.

Test results

cd ~/gpu-operator/tests/e2e
make dcm_e2e
# equivalent: go test -test.timeout=360m -check.f 'TestDCM.*' -v \
#   -deviceConfigName test-deviceconfig -simEnable

OK: 2 passed, 7 skipped
--- PASS: Test (318.06s)
PASS
ok  	github.com/ROCm/gpu-operator/tests/e2e	318.082s

Cherrypick triggered by: ACP-Automation

…d and it's E2Es (#1267)

* DCM: mount default ConfigMap when spec.configManager.config is omitted

When DeviceConfig.spec.configManager.config is nil or has an empty name,
the DCM DaemonSet now always mounts a ConfigMap volume named
default-dcm-config (configurable by setting spec.configManager.config.name).

Add E2E coverage (TestDCMDefaultConfigMapWhenConfigOmitted), cluster_test
helpers, SIM skips for GPU-only partition tests, and align E2E_DCM_IMAGE
in dev.env with v1.4.1.

* Helm default CM + operator EnsureDefaultDCMConfigMap + E2E/docs

* changes

* address comments

* comments

* dcm changes

(cherry picked from commit e9c1e916751b822d24998c7e7763ce4c12de534b)
@ci-penbot-01
Copy link
Copy Markdown
Contributor Author

AI-Assisted Cherry-Pick

Source PR: #1267
Target Branch: main

The cherry-pick operation encountered merge conflicts which were resolved automatically using AI assistance.

Files with conflicts (resolved by AI):

  • bundle/manifests/amd-gpu-operator.clusterserviceversion.yaml:38-44
Original conflict in bundle/manifests/amd-gpu-operator.clusterserviceversion.yaml
<<<<<<< HEAD
    containerImage: docker.io/rocm/amd-gpu-operator:dev
    createdAt: "2026-04-02T12:26:30Z"
=======
    containerImage: registry.test.pensando.io:5000/amd-gpu-operator:dev
    createdAt: "2026-04-06T08:31:30Z"
>>>>>>> e9c1e916... DCM: mount default ConfigMap when spec.configManager.config is omitted and it's E2Es (#1267)

Cherry-pick triggered by: ACP-Automation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants