The NVIDIA System Management Interface (nvidia-smi
) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
This utility allows administrators to query GPU device state and with the appropriate privileges, permits administrators to modify GPU device state. It is targeted at the TeslaTM, GRIDTM, QuadroTM and Titan X product, though limited support is also available on other NVIDIA GPUs.
NVIDIA-smi ships with NVIDIA GPU display drivers on Linux. Nvidia-smi can report query information as XML or human readable plain text to either standard output or a file. For more details, please refer to the nvidia-smi documentation.
- Example Output
- Querying GPU Status
- Monitoring and Managing GPU Boost
- Reviewing System/GPU Topology and NVLink with nvidia-smi
- Printing all GPU Details
Example Output
# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | 0 |
| N/A 41C P0 57W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | 0 |
| N/A 37C P0 53W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 |
| N/A 38C P0 57W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | 0 |
| N/A 42C P0 57W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Querying GPU Status
These are NVIDIA’s high-performance compute GPUs and provide a good deal of health and status information.
List all available NVIDIA devices
$ nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
List certain details about each GPU
$ nvidia-smi --query-gpu=index,name,uuid,serial --format=csv
index, name, uuid, serial
0, Tesla V100-SXM2-32GB, GPU-5a80af23-787c-cbcb-92de-c80574883c5d, 1562720002969
1, Tesla V100-SXM2-32GB, GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4, 1562520023800
2, Tesla V100-SXM2-32GB, GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04, 1562420015554
3, Tesla V100-SXM2-32GB, GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f, 1562520023100
Monitor overall GPU usage with 1-second update intervals
$ nvidia-smi dmon
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 57 42 39 0 0 0 0 877 1290
1 54 38 38 0 0 0 0 877 1290
2 57 38 38 0 0 0 0 877 1290
3 57 43 41 0 0 0 0 877 1290
Monitor per-process GPU usage with 1-second update intervals
$ nvidia-smi pmon
# gpu pid type sm mem enc dec command
# Idx # C/G % % % % name
0 14835 C 45 15 0 0 python
1 14945 C 64 50 0 0 python
2 - - - - - - -
3 - - - - - - -
in this case, two different python processes are running; one on each GPU; only 2 over 4 GPU are used
Monitoring and Managing GPU Boost
The GPU Boost feature which NVIDIA has included with more recent GPUs allows the GPU clocks to vary depending upon load (achieving maximum performance so long as power and thermal headroom are available). However, the amount of available headroom will vary by application (and even by input file!) so users should keep their eyes on the status of the GPUs. A listing of available clock speeds can be shown for each GPU on Turing with V100:
$ nvidia-smi -q -d SUPPORTED_CLOCKS
==============NVSMI LOG==============
Timestamp : Mon Nov 23 18:48:39 2020
Driver Version : 450.51.06
CUDA Version : 11.0
Attached GPUs : 4
GPU 00000000:18:00.0
Supported Clocks
Memory : 877 MHz
Graphics : 1530 MHz
Graphics : 1522 MHz
Graphics : 1515 MHz
Graphics : 1507 MHz
[...180 additional clock speeds omitted...]
Graphics : 150 MHz
Graphics : 142 MHz
Graphics : 135 MHz
As shown, the Tesla V100 GPU supports 187 different clock speeds (from 135 MHz to 1530 MHz). However, only one memory clock speed is supported (877 MHz). Some GPUs support two different memory clock speeds (one high speed and one power-saving speed). Typically, such GPUs only support a single GPU clock speed when the memory is in the power-saving speed (which is the idle GPU state). On all recent Tesla and Quadro GPUs, GPU Boost automatically manages these speeds and runs the clocks as fast as possible (within the thermal/power limits and any limits set by the administrator).
To review the current GPU clock speed (here we display 1 GPU), default clock speed, and maximum possible clock speed, run:
$ nvidia-smi -q -d CLOCK
==============NVSMI LOG==============
Timestamp : Mon Nov 23 18:56:48 2020
Driver Version : 450.51.06
CUDA Version : 11.0
Attached GPUs : 4
GPU 00000000:18:00.0
Clocks
Graphics : 1290 MHz
SM : 1290 MHz
Memory : 877 MHz
Video : 1170 MHz
Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Default Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Max Clocks
Graphics : 1530 MHz
SM : 1530 MHz
Memory : 877 MHz
Video : 1372 MHz
Max Customer Boost Clocks
Graphics : 1530 MHz
SM Clock Samples
Duration : 0.01 sec
Number of Samples : 4
Max : 1290 MHz
Min : 135 MHz
Avg : 870 MHz
Memory Clock Samples
Duration : 0.01 sec
Number of Samples : 4
Max : 877 MHz
Min : 877 MHz
Avg : 877 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
...
Ideally, you’d like all clocks to be running at the highest speed all the time. However, this will not be possible for all applications. To review the current state of each GPU and any reasons for clock slowdowns, use the PERFORMANCE
flag:
$ nvidia-smi -q -d PERFORMANCE
Attached GPUs : 4
GPU 00000000:18:00.0
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
...
Reviewing System/GPU Topology and NVLink with nvidia-smi
To properly take advantage of more advanced NVIDIA GPU features (such as GPU Direct), it is vital that the system topology be properly configured. The topology refers to how the various system devices (GPUs, InfiniBand HCAs, storage controllers, etc.) connect to each other and to the system’s CPUs. Certain topology types will reduce performance or even cause certain features to be unavailable. To help tackle such questions, nvidia-smi supports system topology and connectivity queries:
$ nvidia-smi topo --matrix
GPU0 GPU1 GPU2 GPU3 mlx5_0 mlx5_1 CPU Affinity NUMA Affinity
GPU0 X NV2 NV2 NV2 NODE NODE 0,2,4,6,8,10 0
GPU1 NV2 X NV2 NV2 NODE NODE 0,2,4,6,8,10 0
GPU2 NV2 NV2 X NV2 SYS SYS 1,3,5,7,9,11 1
GPU3 NV2 NV2 NV2 X SYS SYS 1,3,5,7,9,11 1
mlx5_0 NODE NODE SYS SYS X PIX
mlx5_1 NODE NODE SYS SYS PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Reviewing this section will take some getting used to, but can be very valuable. The above configuration shows 4 Tesla V100 and 2 Mellanox EDR InfiniBand HCA (mlx5_0
and mlx5_1
) all connected to the first CPU of a server. Because the CPUs are 12-core Xeons, the topology tool recommends that jobs be assigned to the first 12 CPU cores (although this will vary by application).
The NVLink connections themselves can also be queried to ensure status, capability, and health. Readers are encouraged to consult NVIDIA documentation to better understand the specifics.
$ nvidia-smi nvlink --status
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
Link 0: 25.781 GB/s
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
Link 4: 25.781 GB/s
Link 5: 25.781 GB/s
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
Link 0: 25.781 GB/s
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
Link 4: 25.781 GB/s
Link 5: 25.781 GB/s
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
Link 0: 25.781 GB/s
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
Link 4: 25.781 GB/s
Link 5: 25.781 GB/s
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
Link 0: 25.781 GB/s
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
Link 4: 25.781 GB/s
Link 5: 25.781 GB/s
nvidia-smi nvlink --capabilities
Printing all GPU Details
To list all available data on a particular GPU, specify the ID of the card with -i. Here’s the output from an older Tesla GPU card:
$ nvidia-smi -i 0 -q
$ nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE