... | ... | @@ -46,36 +46,50 @@ NVIDIA-smi ships with NVIDIA GPU display drivers on Linux. Nvidia-smi can report |
|
|
## Querying GPU Status
|
|
|
These are NVIDIA’s high-performance compute GPUs and provide a good deal of health and status information.
|
|
|
|
|
|
### To list all available NVIDIA devices, run:
|
|
|
### List all available NVIDIA devices
|
|
|
|
|
|
```
|
|
|
# nvidia-smi -L
|
|
|
$ nvidia-smi -L
|
|
|
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
|
|
|
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
|
|
|
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
|
|
|
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
|
|
|
```
|
|
|
|
|
|
### To list certain details about each GPU, try:
|
|
|
### List certain details about each GPU
|
|
|
|
|
|
nvidia-smi --query-gpu=index,name,uuid,serial --format=csv
|
|
|
|
|
|
0, Tesla K40m, GPU-d0e093a0-c3b3-f458-5a55-6eb69fxxxxxx, 0323913xxxxxx
|
|
|
1, Tesla K40m, GPU-d105b085-7239-3871-43ef-975ecaxxxxxx, 0324214xxxxxx
|
|
|
|
|
|
To monitor overall GPU usage with 1-second update intervals:
|
|
|
```
|
|
|
$ nvidia-smi --query-gpu=index,name,uuid,serial --format=csv
|
|
|
index, name, uuid, serial
|
|
|
0, Tesla V100-SXM2-32GB, GPU-5a80af23-787c-cbcb-92de-c80574883c5d, 1562720002969
|
|
|
1, Tesla V100-SXM2-32GB, GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4, 1562520023800
|
|
|
2, Tesla V100-SXM2-32GB, GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04, 1562420015554
|
|
|
3, Tesla V100-SXM2-32GB, GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f, 1562520023100
|
|
|
```
|
|
|
|
|
|
nvidia-smi dmon
|
|
|
### Monitor overall GPU usage with 1-second update intervals
|
|
|
|
|
|
```
|
|
|
$ nvidia-smi dmon
|
|
|
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
|
|
|
# Idx W C C % % % % MHz MHz
|
|
|
0 43 35 - 0 0 0 0 2505 1075
|
|
|
1 42 31 - 97 9 0 0 2505 1075
|
|
|
(in this example, one GPU is idle and one GPU has 97% of the CUDA sm "cores" in use)
|
|
|
0 57 42 39 0 0 0 0 877 1290
|
|
|
1 54 38 38 0 0 0 0 877 1290
|
|
|
2 57 38 38 0 0 0 0 877 1290
|
|
|
3 57 43 41 0 0 0 0 877 1290
|
|
|
```
|
|
|
|
|
|
To monitor per-process GPU usage with 1-second update intervals:
|
|
|
### Monitor per-process GPU usage with 1-second update intervals
|
|
|
|
|
|
nvidia-smi pmon
|
|
|
```
|
|
|
$ nvidia-smi pmon
|
|
|
# gpu pid type sm mem enc dec command
|
|
|
# Idx # C/G % % % % name
|
|
|
0 14835 C 45 15 0 0 python
|
|
|
1 14945 C 64 50 0 0 python
|
|
|
2 - - - - - - -
|
|
|
3 - - - - - - -
|
|
|
```
|
|
|
*in this case, two different python processes are running; one on each GPU; only 2 over 4 GPU are used*
|
|
|
|
|
|
## Useful commands
|
|
|
|