Job Handling
Submitting the Job
Submitting the SLURM job is done by command sbatch
. SLURM will read the submit file, and schedule the job according to the description in the submit file.
Submitting the job described above is:
$ sbatch example.sh
Submitted batch job 62
Checking Job Status
To check the status of your job, use the squeue
command. It will provide information such as:
- The State (ST) of the job:
- R - Running
- PD - Pending - Job is awaiting resource allocation.
- Additional codes are available on the squeue page.
- Job Name
- Run Time
- Nodes running the job
Checking the status of jobs owned by a specific username, use the -u
option
$ squeue -u <username>
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
65 sixhour hello-wo <username> R 0:56 1 g004
Additionally, if you want to see the status of a specific partition, for example if you are part of a partition, you can use the -p
option to squeue
:
$ squeue -p sixhour
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
73435 sixhour MyRandom jayhawk R 10:35:20 1 r10r29n1
73436 sixhour MyRandom jayhawk R 10:35:20 1 r10r29n1
73735 sixhour SW2\_driv bigjay R 10:14:11 1 r31r29n1
73736 sixhour SW2\_driv bigjay R 10:14:11 1 r31r29n1
Checking Job Start
You may view the start time of your job with the command squeue --start
. The output of the command will show the expected start time of the jobs.
$ squeue --start --user jayhawk
JOBID PARTITION NAME USER ST START\_TIME NODES NODELIST(REASON)
5822 sixhour Jobname bigjay PD 2018-08-24T00:05:09 3 (Priority)
5823 sixhour Jobname bigjay PD 2018-08-24T00:07:39 3 (Priority)
5824 sixhour Jobname bigjay PD 2018-08-24T00:09:09 3 (Priority)
5825 sixhour Jobname bigjay PD 2018-08-24T00:12:09 3 (Priority)
5826 sixhour Jobname bigjay PD 2018-08-24T00:12:39 3 (Priority)
5827 sixhour Jobname bigjay PD 2018-08-24T00:12:39 3 (Priority)
5828 sixhour Jobname bigjay PD 2018-08-24T00:12:39 3 (Priority)
5829 sixhour Jobname bigjay PD 2018-08-24T00:13:09 3 (Priority)
5830 sixhour Jobname bigjay PD 2018-08-24T00:13:09 3 (Priority)
5831 sixhour Jobname bigjay PD 2018-08-24T00:14:09 3 (Priority)
5832 sixhour Jobname bigjay PD N/A 3 (Priority)
The output shows the expected start time of the jobs, as well as the reason that the jobs are currently idle (in this case, low priority of the user due to running numerous jobs already).
Cancel the Job
Cancelling the job is done with the scancel
command. The only argument to the scancel
command is the job id. The command is:
$ scancel 2234
Job History
sacct
can be used to display currently running jobs and their usage and also previous job usage. It can be customized to look at certain options
$ sacct -u <user>
170 parallel\_+ sixhour crc 4 COMPLETED 0:0
170.batch batch crc 4 COMPLETED 0:0
171 parallel\_+ sixhour crc 4 CANCELLED+ 0:0
171.batch batch crc 4 CANCELLED 0:15
Show all job information starting form a specific date
$ sacct --starttime 2014-07-01
Show job account information for a specific job
$ sacct -j <jobid>
$ sacct -j <jobid> -l
SLURM Commands
Below are some common, useful SLURM commands:
SLURM Command | Function |
---|---|
sacct |
Used to report job or job step accounting information about active or completed jobs. |
sinfo |
Reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options. |
srun |
Used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation. |
squeue |
Reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order. |
squeue -u <username> |
Display the jobs submitted by the specified <username> |
squeue -p <partition> |
Display the jobs in the specified <partition> . (Will not show jobs running in the sixhour partition that may be running on an owner partition) |
scontrol show job <jobid> |
Check the status of a job (<jobid> ). |
squeue --start --job <jobid> |
Show an estimate of when your job (<jobid> ) might start. |
scontrol show nodes <node_name> |
Check the status of a node (<node_name> ). |
scancel <jobid> |
Cancel a job. |