Skip to content

[Bug]: vgpu-manager lspci fails in air-gapped environment #616

@daven-que

Description

@daven-que

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
Case number 01086911

Describe the bug
vgpu-manager lspci fails in air-gapped environment (on rheli9 and previously rheli8 - other environments not tested)

To Reproduce
cd gpu-driver-container\vgpu-manager\rhel9
set VERSION=580.126.08
set OS_TAG=rhcos4.18
docker build --build-arg DRIVER_VERSION=%VERSION% -t vgpu-manager:%VERSION%-%OS_TAG% .
docker save -o vgpu-manager-%VERSION%-%OS_TAG%.tar vgpu-manager:%VERSION%-%OS_TAG%
Upload vgpu-manager-580.126.08-rhcos4.18.tar to Openshift OCP hosts in an air-gapped environment
Openshift Console, navigate to Operators->Installed Operators->NVIDIA GPU Operator->All instances->gpu-cluster-policy, update the version to 580.126.08
the drivers fail as lspci cannot open libpci.so.3

Expected behavior
Drivers should load without errors.

Environment (please provide the following information):

  • gpu-driver-container source (Commit SHA or image digest): sha256:6f8a016ab415bf4cdd8e1868de5ae5cf44e9681c
  • NVIDIA Driver Version: 580.126.08
  • Host OS: Openshift ocp
  • Kernel Version: rhcos4.18
  • Container Runtime Version: ?
  • CPU Architecture x86_64
  • GPU Model(s) T4/A2/A16

If applicable, also provide:

  • Kubernetes Distro and Version: OpenShift
  • NVIDIA GPU Operator version: 25.10.1

With https://github.com/NVIDIA/gpu-driver-container code from 2026-02-18, lspci fails in an air gapped environment. The error message is that it cannot load libpci.so.3. Also, setpci is required and it cannot be found. Dec 2025 modifications were done by users akri3 and Shivkumar Ople, to partially correct this problem but they do not work without additional code changes to the file gpu-driver-container\vgpu-manager\rhel9\ocp_dtk_entrypoint.

This is the fix we implemented in the source code (gpu-driver-container\vgpu-manager\rhel9\ocp_dtk_entrypoint):

Below line 34 /usr/sbin/lspci \ add these two lines:
/usr/sbin/setpci
/lib64/libpci.so.* \

Below line 110 "$DRIVER_TOOLKIT_SHARED_DIR/lspci" \ add this line:
"$DRIVER_TOOLKIT_SHARED_DIR/setpci" \

Below line 113 export PATH="${DRIVER_TOOLKIT_SHARED_DIR}/bin:$PATH"; add this section of code;

mkdir "${DRIVER_TOOLKIT_SHARED_DIR}/lib" -p

cp -v \

"$DRIVER_TOOLKIT_SHARED_DIR"/libpci.so.* \

"${DRIVER_TOOLKIT_SHARED_DIR}/lib"

export LD_LIBRARY_PATH="${DRIVER_TOOLKIT_SHARED_DIR}/lib:$LD_LIBRARY_PATH";

A similiar change would need to be done in the rhel8\ocp_dtk_entrypoint file

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions