Skip to content

Conversation

@shivakunv
Copy link
Contributor

No description provided.

@tariq1890
Copy link
Contributor

let's wait for PR #420 to get merged. After that, we can update this PR to align with #420

@shivakunv shivakunv force-pushed the gpumanagerkernelmodulespec branch from c3ca1e4 to 800e42e Compare January 6, 2026 12:40
Signed-off-by: Shiva Kumar (SW-CLOUD) <shivaku@nvidia.com>
@shivakunv shivakunv force-pushed the gpumanagerkernelmodulespec branch from e4c606e to 6ba105b Compare January 6, 2026 13:10
@shivakunv
Copy link
Contributor Author

let's wait for PR #420 to get merged. After that, we can update this PR to align with #420

adapted similar change for vgpu

Comment on lines +96 to +107
# Unload modules if they're already loaded so we can reload with custom parameters
if [ -f /sys/module/nvidia_vgpu_vfio/refcnt ] || [ -f /sys/module/nvidia/refcnt ]; then
echo "NVIDIA modules already loaded by installer, unloading to apply custom parameters..."
rmmod nvidia_vgpu_vfio 2>/dev/null || true
rmmod nvidia 2>/dev/null || true
fi

echo "Loading NVIDIA driver kernel modules..."
set -o xtrace +o nounset
modprobe nvidia
modprobe nvidia_vgpu_vfio
set +o xtrace -o nounset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change did not make sense to me at first. But then I realized that the _install_driver() actually loads the modules (we leverage the nvidia-installer to load the modules, in addition to building+installing the driver).

Question -- If we create the modprobe conf file (/etc/modprobe.d/nvidia.conf) BEFORE calling _install_driver(), what is the observed behavior? Do the modules get loaded with the desired parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I put the else condition inside _create_module_params_conf and tested it , it is working fine .
 else
        echo "create nvidia.conf file"
        echo "options nvidia NVreg_EnableGpuFirmwareLogs=3" > ${MODPROBE_CONFIG_DIR}/nvidia.conf
    fi
_create_module_params_conf() {
    echo "Parsing kernel module parameters..."
    _get_module_params

    if [ ${#NVIDIA_MODULE_PARAMS[@]} -gt 0 ]; then
        echo "Configuring nvidia module parameters in ${MODPROBE_CONFIG_DIR}/nvidia.conf"
        echo "options nvidia ${NVIDIA_MODULE_PARAMS[@]}" > ${MODPROBE_CONFIG_DIR}/nvidia.conf
    else
        echo "create nvidia.conf file"
        echo "options nvidia NVreg_EnableGpuFirmwareLogs=3" > ${MODPROBE_CONFIG_DIR}/nvidia.conf
    fi
}

  1. I created a configmap for the nvdia.conf file and tested it , which is also working fine . ( refer : PR: vgpu-manager: enable kernel module configuration via KernelModuleConfig gpu-operator#1946

If I need to test it in other ways, please list the steps .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original comment is questioning whether this code snippet (lines 96-L107) is actually needed. Can you remove this snippet and verify whether custom module parameters can be applied or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed .
After removal, the custom module parameter does not take effect and required a system reboot to be applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants