... | ... | @@ -23,19 +23,62 @@ However, if the computations require a large amount of GPU resources (in number |
|
|
|
|
|
It is possible to open a terminal directly on an accelerated compute node on which the resources have been reserved for you (here, 1 GPU on the default gpu partition) by using the following command:
|
|
|
|
|
|
$ srun --pty --ntasks=1 --cpus-per-task=10 --gres=gpu:1 --hint=nomultithread [--other-options] bash
|
|
|
$ srun --pty --ntasks=1 --cpus-per-task=12 --gres=gpu:1 --hint=nomultithread [--other-options] bash
|
|
|
|
|
|
Comments:
|
|
|
|
|
|
An interactive terminal is obtained with the --pty option.
|
|
|
The reservation of physical cores is assured with the --hint=nomultithread option (no hyperthreading).
|
|
|
The memory allocated for the job is proportional to the number of requested CPU cores . For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. On the default gpu partition, the --cpus-per-task=10 option allows reserving 1/4 of the node memory per GPU. On the gpu_p2 partition (--partition=gpu_p2), you need to specify --cpus-per-task=3 to reserve 1/8 of the node memory per GPU, and thus be coherent with the node configuration. You may consult our documentation on this subject: Memory allocation on GPU partitions.
|
|
|
--other-options contains the usual Slurm options for job configuration (--time=, etc.): See the documentation on batch submission scripts in the index section Execution/Commands of a GPU code.
|
|
|
The reservations have all the resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits of them by specifying another partition and/or QoS as detailed in our documentation about the partitions and QoS.
|
|
|
For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account (project hours allocation) to count the computing hours of the job as explained in our documentation about computing hours management.
|
|
|
We strongly recommend that you consult our documentation detailing computing hours management on Jean Zay to ensure that the hours consumed by your jobs are deducted from the correct allocation.
|
|
|
*Comments*
|
|
|
- An interactive terminal is obtained with the `--pty` option.
|
|
|
- The reservation of physical cores is assured with the `--hint=nomultithread option` (no hyperthreading).
|
|
|
- The memory allocated for the job is proportional to the number of requested CPU cores . For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. On the default gpu partition on Turing01, the `--cpus-per-task=12` option allows reserving 1/4 of the node memory per GPU. You may consult our documentation on this subject: [Memory allocation on GPU partitions](Memory-allocation-on-GPU-partition)
|
|
|
- `--other-options` contains the usual Slurm options for job configuration (--time=, etc.): See the documentation on [batch submission scripts](Batch-job-commands)
|
|
|
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account to count the computing hours of the job.
|
|
|
- We strongly recommend that you consult our documentation detailing computing hours management on Liger to ensure that the hours consumed by your jobs are deducted from the correct allocation.
|
|
|
|
|
|
The terminal is operational after the resources have been granted:
|
|
|
|
|
|
$ srun --pty --ntasks=1 --cpus-per-task=10 --gres=gpu:1 --hint=nomultithread bash
|
|
|
|
|
|
$ srun --pty --ntasks=1 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread bash
|
|
|
|
|
|
```
|
|
|
[login02 ~]$ srun --pty -p gpus --ntasks=1 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread bash
|
|
|
[turing01 ~]$ hostname
|
|
|
turing01
|
|
|
[turing01 ~]$ printenv | grep CUDA
|
|
|
CUDA_VISIBLE_DEVICES=0,1
|
|
|
[turing01 ~]$ nvidia-smi -L
|
|
|
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
|
|
|
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
|
|
|
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
|
|
|
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
|
|
|
[turing01 ~]$ nvidia-smi
|
|
|
Tue Dec 1 19:07:10 2020
|
|
|
+-----------------------------------------------------------------------------+
|
|
|
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|
|
|
|-------------------------------+----------------------+----------------------+
|
|
|
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|
|
|
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|
|
|
| | | MIG M. |
|
|
|
|===============================+======================+======================|
|
|
|
| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | 0 |
|
|
|
| N/A 38C P0 55W / 300W | 0MiB / 32510MiB | 0% Default |
|
|
|
| | | N/A |
|
|
|
+-------------------------------+----------------------+----------------------+
|
|
|
| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | 0 |
|
|
|
| N/A 34C P0 52W / 300W | 0MiB / 32510MiB | 0% Default |
|
|
|
| | | N/A |
|
|
|
+-------------------------------+----------------------+----------------------+
|
|
|
| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 |
|
|
|
| N/A 34C P0 55W / 300W | 0MiB / 32510MiB | 0% Default |
|
|
|
| | | N/A |
|
|
|
+-------------------------------+----------------------+----------------------+
|
|
|
| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | 0 |
|
|
|
| N/A 38C P0 56W / 300W | 0MiB / 32510MiB | 0% Default |
|
|
|
| | | N/A |
|
|
|
+-------------------------------+----------------------+----------------------+
|
|
|
|
|
|
+-----------------------------------------------------------------------------+
|
|
|
| Processes: |
|
|
|
| GPU GI CI PID Type Process name GPU Memory |
|
|
|
| ID ID Usage |
|
|
|
|=============================================================================|
|
|
|
| No running processes found |
|
|
|
+-----------------------------------------------------------------------------+
|
|
|
``` |