|
|
|
## First steps
|
|
|
|
|
|
|
|
To submit a submission script:
|
|
|
|
|
|
|
|
$ sbatch script.slurm
|
|
|
|
|
|
|
|
To monitor jobs which are waiting or in execution:
|
|
|
|
|
|
|
|
$ squeue -u $USER
|
|
|
|
|
|
|
|
This command displays information in the following form:
|
|
|
|
|
|
|
|
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
|
|
|
|
235 part_name test abc R 00:02 1 r6i3n1
|
|
|
|
|
|
|
|
Where
|
|
|
|
JOBID: Job identifier
|
|
|
|
PARTITION: Partition used
|
|
|
|
NAME: Job name
|
|
|
|
USER: User name of job owner
|
|
|
|
ST: Status of job execution ( R=running, PD=pending, CG=completing )
|
|
|
|
TIME: Elapsed time
|
|
|
|
NODES: Number of nodes used
|
|
|
|
NODELIST: List of nodes used
|
|
|
|
|
|
|
|
Note: You can use the --start option to display an estimated start time for your jobs (“START_TIME” column). Slurm might not have a reliable estimation for the start time of some jobs, in this case the information will show as not available (“N/A”). Since the list of pending jobs is always evolving, it is important to note that the information given by Slurm is only an estimate which might change depending on the machine load.
|
|
|
|
|
|
|
|
To obtain complete information about a job (allocated resources and execution status) :
|
|
|
|
|
|
|
|
$ scontrol show job $JOBID
|
|
|
|
|
|
|
|
To cancel an execution:
|
|
|
|
|
|
|
|
$ scancel $JOBID
|
|
|
|
|
|
|
|
## Comments
|
|
|
|
|
|
|
|
A complete reference table of Slurm commands is available here .
|
|
|
|
|
|
|
|
In case of a problem on the machine, the SLURM default configuration is such that the running jobs are automatically restarted from scratch. If you want to avoid this behavior, you should use the --no-requeue option in the submission process, that is, submit your job doing
|
|
|
|
|
|
|
|
$ sbatch --no-requeue script.slurm
|
|
|
|
|
|
|
|
or add the line
|
|
|
|
|
|
|
|
$SBATCH --no-requeue
|
|
|
|
|
|
|
|
in your submission script.
|
|
|
|
|
|
|
|
|
|
|
|
\ No newline at end of file |