... | @@ -35,21 +35,19 @@ It is possible to open a terminal directly on an accelerated compute node on whi |
... | @@ -35,21 +35,19 @@ It is possible to open a terminal directly on an accelerated compute node on whi |
|
|
|
|
|
The terminal is operational after the resources have been granted:
|
|
The terminal is operational after the resources have been granted:
|
|
|
|
|
|
$ srun --pty --ntasks=1 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread bash
|
|
|
|
|
|
|
|
```
|
|
```
|
|
[login02 ~]$ srun --pty -p gpus --ntasks=1 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread bash
|
|
[randria@login02 ~]$ srun --pty -p gpus --ntasks=1 --cpus-per-task=12 --gres=gpu:1 --hint=nomultithread bash
|
|
[turing01 ~]$ hostname
|
|
[randria@turing01 ~]$ hostname
|
|
turing01
|
|
turing01
|
|
[turing01 ~]$ printenv | grep CUDA
|
|
[randria@turing01 ~]$ printenv | grep CUDA
|
|
CUDA_VISIBLE_DEVICES=0,1
|
|
CUDA_VISIBLE_DEVICES=0
|
|
[turing01 ~]$ nvidia-smi -L
|
|
[randria@turing01 ~]$ nvidia-smi -L
|
|
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
|
|
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
|
|
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
|
|
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
|
|
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
|
|
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
|
|
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
|
|
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
|
|
[turing01 ~]$ nvidia-smi
|
|
[randria@turing01 ~]$ nvidia-smi
|
|
Tue Dec 1 19:07:10 2020
|
|
Tue Dec 1 19:11:33 2020
|
|
+-----------------------------------------------------------------------------+
|
|
+-----------------------------------------------------------------------------+
|
|
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|
|
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|
|
|-------------------------------+----------------------+----------------------+
|
|
|-------------------------------+----------------------+----------------------+
|
... | @@ -58,7 +56,7 @@ Tue Dec 1 19:07:10 2020 |
... | @@ -58,7 +56,7 @@ Tue Dec 1 19:07:10 2020 |
|
| | | MIG M. |
|
|
| | | MIG M. |
|
|
|===============================+======================+======================|
|
|
|===============================+======================+======================|
|
|
| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | 0 |
|
|
| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | 0 |
|
|
| N/A 38C P0 55W / 300W | 0MiB / 32510MiB | 0% Default |
|
|
| N/A 37C P0 55W / 300W | 0MiB / 32510MiB | 0% Default |
|
|
| | | N/A |
|
|
| | | N/A |
|
|
+-------------------------------+----------------------+----------------------+
|
|
+-------------------------------+----------------------+----------------------+
|
|
| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | 0 |
|
|
| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | 0 |
|
... | @@ -66,7 +64,7 @@ Tue Dec 1 19:07:10 2020 |
... | @@ -66,7 +64,7 @@ Tue Dec 1 19:07:10 2020 |
|
| | | N/A |
|
|
| | | N/A |
|
|
+-------------------------------+----------------------+----------------------+
|
|
+-------------------------------+----------------------+----------------------+
|
|
| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 |
|
|
| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 |
|
|
| N/A 34C P0 55W / 300W | 0MiB / 32510MiB | 0% Default |
|
|
| N/A 35C P0 55W / 300W | 0MiB / 32510MiB | 0% Default |
|
|
| | | N/A |
|
|
| | | N/A |
|
|
+-------------------------------+----------------------+----------------------+
|
|
+-------------------------------+----------------------+----------------------+
|
|
| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | 0 |
|
|
| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | 0 |
|
... | @@ -81,4 +79,22 @@ Tue Dec 1 19:07:10 2020 |
... | @@ -81,4 +79,22 @@ Tue Dec 1 19:07:10 2020 |
|
|=============================================================================|
|
|
|=============================================================================|
|
|
| No running processes found |
|
|
| No running processes found |
|
|
+-----------------------------------------------------------------------------+
|
|
+-----------------------------------------------------------------------------+
|
|
|
|
[randria@turing01 ~]$ squeue -j $SLURM_JOB_ID
|
|
|
|
JOBID PARTITION USER ST TIME NODES CPUS QOS PRIORITY NODELIST(REASON) NAME
|
|
|
|
1730514 gpus randria R 3:03 1 12 normal 396309 turing01 bash
|
|
|
|
[randria@turing01 ~]$ scontrol show job $SLURM_JOB_ID
|
|
|
|
JobId=1730514 JobName=bash
|
|
|
|
Priority=396309 Nice=0 Account=ici QOS=normal
|
|
|
|
JobState=RUNNING Reason=None Dependency=(null)
|
|
|
|
RunTime=00:03:33 TimeLimit=01:00:00 TimeMin=N/A
|
|
|
|
Partition=gpus AllocNode:Sid=login02:22331
|
|
|
|
NodeList=turing01
|
|
|
|
BatchHost=turing01
|
|
|
|
NumNodes=1 NumCPUs=12 CPUs/Task=12 ReqB:S:C:T=0:0:*:1
|
|
|
|
MinCPUsNode=12 MinMemoryCPU=8G MinTmpDiskNode=0
|
|
|
|
Command=bash
|
|
```
|
|
```
|
|
|
|
|
|
|
|
*Comments*
|
|
|
|
- `CUDA_VISIBLE_DEVICES=0` means here we have allocated only 1 GPU (if we had 2 GPUs requested, it would be `0,1`) . You can also use the variable `GPU_DEVICE_ORDINAL`
|
|
|
|
- |