Support optional subsystem ID matching in device filters - vGPU#167
Support optional subsystem ID matching in device filters - vGPU#167mattwittwer wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
|
Hey @mattwittwer! Did you get the chance to test this on two GPUs with identical PCI device IDs? Did you get the chance to test this change on an A2/A16? The core logic looks sound and reasonable to me. Do you think you'd also be able to add some tests? Namely, to |
…om breaking the check
…mattwittwer/vgpu-device-manager into mwittwer/add-vgpu-subsystem-id-matching
Hi @karthikvetrivel! I have not been able to try this out directly with A2 and A16 GPUs on the same cluster. The user who reported this issue was able to return their nodes to production by manually setting the labels. |
There was a problem hiding this comment.
As @rajathagasthya said on the mig-parted PR, this change needs to be made in https://github.com/NVIDIA/go-nvlib and vendored here with a go mod dependency update.
Summary
device-filterWhy
Some GPUs share the same PCI vendor/device ID but differ by subsystem ID, which can cause the wrong config block to match and fail validation. This allows config authors to disambiguate those devices without breaking existing configs.
Changes
go-nvlibPCI discovery to read subsystem vendor/device IDsDeviceIDparsing and matching logic to optionally include subsystem IDs