Problem Description
Hello
I hope this finds you well.
On MI300A in TPX and CPX modes, when running amd-smi command it correctly identifies the number of GPUs and everything is fine.
The problem comes to the size column (attribute) that is used to represent the memory size for each XCD (GPU).
AMD documentation show that in TPX mode each XCD (GPU) gets 32GB but the amd-smi command shows 42.66GB
The same for CPX mode, the docs mention 16GB for each XCD, but the amd-smi tool show 21.33GB
In the case of SPX mode, everything matches as it is.
Any idea/help what could be the reason for these differences and why?
Appreciate your precious time.
Operating System
Linux RHE
CPU
N/A
GPU
MI300A
ROCm Version
6.2.4
ROCm Component
amdsmi
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Problem Description
Hello
I hope this finds you well.
On
MI300AinTPXandCPXmodes, when runningamd-smicommand it correctly identifies the number of GPUs and everything is fine.The problem comes to the
sizecolumn (attribute) that is used to represent the memory size for each XCD (GPU).AMD documentation show that in
TPXmode each XCD (GPU) gets32GBbut theamd-smicommand shows42.66GBThe same for
CPXmode, the docs mention16GBfor each XCD, but theamd-smitool show21.33GBIn the case of
SPXmode, everything matches as it is.Any idea/help what could be the reason for these differences and why?
Appreciate your precious time.
Operating System
Linux RHE
CPU
N/A
GPU
MI300A
ROCm Version
6.2.4
ROCm Component
amdsmi
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response