... | ... | @@ -186,3 +186,64 @@ GPU 00000000:18:00.0 |
|
|
Display Clock Setting : Not Active
|
|
|
...
|
|
|
```
|
|
|
|
|
|
## Reviewing System/GPU Topology and NVLink with nvidia-smi
|
|
|
|
|
|
To properly take advantage of more advanced NVIDIA GPU features (such as GPU Direct), it is vital that the system topology be properly configured. The topology refers to how the various system devices (GPUs, InfiniBand HCAs, storage controllers, etc.) connect to each other and to the system’s CPUs. Certain topology types will reduce performance or even cause certain features to be unavailable. To help tackle such questions, nvidia-smi supports system topology and connectivity queries:
|
|
|
|
|
|
```
|
|
|
$ nvidia-smi topo --matrix
|
|
|
GPU0 GPU1 GPU2 GPU3 mlx5_0 mlx5_1 CPU Affinity NUMA Affinity
|
|
|
GPU0 X NV2 NV2 NV2 NODE NODE 0,2,4,6,8,10 0
|
|
|
GPU1 NV2 X NV2 NV2 NODE NODE 0,2,4,6,8,10 0
|
|
|
GPU2 NV2 NV2 X NV2 SYS SYS 1,3,5,7,9,11 1
|
|
|
GPU3 NV2 NV2 NV2 X SYS SYS 1,3,5,7,9,11 1
|
|
|
mlx5_0 NODE NODE SYS SYS X PIX
|
|
|
mlx5_1 NODE NODE SYS SYS PIX X
|
|
|
|
|
|
Legend:
|
|
|
|
|
|
X = Self
|
|
|
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
|
|
|
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
|
|
|
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
|
|
|
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
|
|
|
PIX = Connection traversing at most a single PCIe bridge
|
|
|
NV# = Connection traversing a bonded set of # NVLinks
|
|
|
```
|
|
|
|
|
|
Reviewing this section will take some getting used to, but can be very valuable. The above configuration shows 4 Tesla V100 and 2 Mellanox EDR InfiniBand HCA (`mlx5_0` and `mlx5_1`) all connected to the first CPU of a server. Because the CPUs are 12-core Xeons, the topology tool recommends that jobs be assigned to the first 12 CPU cores (although this will vary by application).
|
|
|
|
|
|
The NVLink connections themselves can also be queried to ensure status, capability, and health. Readers are encouraged to consult NVIDIA documentation to better understand the specifics.
|
|
|
|
|
|
```
|
|
|
$ nvidia-smi nvlink --status
|
|
|
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
|
|
|
Link 0: 25.781 GB/s
|
|
|
Link 1: 25.781 GB/s
|
|
|
Link 2: 25.781 GB/s
|
|
|
Link 3: 25.781 GB/s
|
|
|
Link 4: 25.781 GB/s
|
|
|
Link 5: 25.781 GB/s
|
|
|
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
|
|
|
Link 0: 25.781 GB/s
|
|
|
Link 1: 25.781 GB/s
|
|
|
Link 2: 25.781 GB/s
|
|
|
Link 3: 25.781 GB/s
|
|
|
Link 4: 25.781 GB/s
|
|
|
Link 5: 25.781 GB/s
|
|
|
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
|
|
|
Link 0: 25.781 GB/s
|
|
|
Link 1: 25.781 GB/s
|
|
|
Link 2: 25.781 GB/s
|
|
|
Link 3: 25.781 GB/s
|
|
|
Link 4: 25.781 GB/s
|
|
|
Link 5: 25.781 GB/s
|
|
|
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
|
|
|
Link 0: 25.781 GB/s
|
|
|
Link 1: 25.781 GB/s
|
|
|
Link 2: 25.781 GB/s
|
|
|
Link 3: 25.781 GB/s
|
|
|
Link 4: 25.781 GB/s
|
|
|
Link 5: 25.781 GB/s
|
|
|
``` |