Arrays

A SLURM job array is a collection of jobs that differ from each other by only a single index parameter. Creating a job array provides an easy way to group related jobs together. For example, if you have a parameter study that requires you to run your application five times, each with a different input parameter, you can use a job array instead of creating five separate SLURM scripts and submitting them separately.

Creating a Job Array

The syntax for submitting job arrays is: sbatch --array <indexlist>[%<limit>] arrayscript.sh. The is optional.

To create a job array, use a single SLURM script and use the --array flag to specify a range for the index parameter, either on the sbatch command line or within your SLURM script. For example, if you submit the following script, a job array with five sub-jobs will be created:

#!/bin/bash
#SBATCH --job-name=myJobarrayTest
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=5:00
#SBATCH --output=test_%A_%a.out
#SBATCH --error=test_%A_%a.err
#SBATCH --partition=sixhour
#SBATCH --array=1-5
echo "$SLURM\_ARRAY_TASK_ID"

Submitting the script to SLURM will return the parent SLURM_ARRAY_JOB_ID.

$ sbatch job_array_script.sh
Submitted batch job 89

Each sub-job in this job array will have a SLURM_ARRAY_JOB_ID that includes both the parent SLURM_ARRAY_JOB_ID and a unique SLURM_ARRAY_TASK_ID after the character underscore "_".

To specify that only a certain number of sub-jobs in the array can run at a time, use the percent sign (%) delimiter. In this example, only five sub-jobs in the array can run at a time.

$ sbatch --array [1-1000]%5 testarray.sh

To submit a specific set of array sub-jobs, use the comma delimiter in the array index list.

$ sbatch --array [1,2,3,4] testarray.sh
$ sbatch --array [1-5,7,10] testarray.sh

To submit a job with a step size, use a colon (:) in the array range and specify how many jobs to step. In the example below, a step size of 2 is requested. The sub-jobs will be numbered according to the step size inside the index limit.

$ sbatch --array [2-10:2] testarray.sh

Checking Status

The status of all array jobs can be viewed with squeue. Detailed information for every job can be seen with scontrol show job <SLURM_ARRAY_JOB_ID>.

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             103_1   sixhour myJobarr r557e636  R       4:23      1 h001
             103_2   sixhour myJobarr r557e636  R       4:23      1 h001
             103_3   sixhour myJobarr r557e636  R       4:23      1 h001
             103_4   sixhour myJobarr r557e636  R       4:23      1 h001
             103_5   sixhour myJobarr r557e636  R       4:23      1 h001

$ scontrol show job 103
JobId=103 ArrayJobId=103 ArrayTaskId=5 JobName=myJobarrayTest
...
JobId=107 ArrayJobId=103 ArrayTaskId=4 JobName=myJobarrayTest
...
JobId=106 ArrayJobId=103 ArrayTaskId=3 JobName=myJobarrayTest
...
JobId=105 ArrayJobId=103 ArrayTaskId=2 JobName=myJobarrayTest
...
JobId=104 ArrayJobId=103 ArrayTaskId=1 JobName=myJobarrayTest

Detailed information for each sub-job can be seen with the command scontrol show job <SLURM_ARRAY_JOB_ID>_<SLURM_ARRAY_TASK_ID>

$ scontrol show job 103_1
JobId=104 ArrayJobId=103 ArrayTaskId=1 JobName=myJobarrayTest

Cancelling a Job Array or Sub-Job

To delete a job array or a sub-job, use the scancel command and specify the array or sub-job.

$ scancel 516540

$ scancel 516540_1

Example Job Arrays

You have a file with a list of paths to different files that you wish perform the same action on. You could submit a job that loops through the file on 1 node and does said action, or you could submit an array job with an index range with however many lines the file may contain. For this example, our file contains 1000 lines.

#!/bin/bash
#
#SBATCH --ntasks=1
#SBATCH --partition=sixhour
#SBATCH --time=6:00:00
#SBATCH --array=1-1000

LINE=$(sed -n "$SLURM_ARRAY_TASK_ID"p File.txt)
echo $LINE

call-program-name-here $LINE

Now say your file contains a list of paths that you need to do action on line 1 and 2, then lines 3 and 4, then lines 5 and 6, and so on.

#!/bin/bash
#
#SBATCH --ntasks=1
#SBATCH --partition=sixhour
#SBATCH --time=6:00:00
#SBATCH --array=1-500

# Job array size will be half of the number of lines in the file

SECOND=$((SLURM_ARRAY_TASK_ID*2))
FIRST="$(($SECOND - 1))"

LINE1=$(sed -n "$FIRST"p File.txt)
LINE2=$(sed -n "$SECOND"p File.txt)

call-program-name-here $LINE1 $LINE2

Now you have a file with 30,000 lines and wish to do work on each line. The program you are calling only takes 30 seconds or so, so why not, instead of having 1 node only do 1 line, then take the next job, have that 1 node loop through 100 lines of the file. Now instead of having 30,000 jobs doing only 1 line, you have 300 jobs doing 100 lines.

#!/bin/bash
#
#SBATCH --ntasks=1
#SBATCH --partition=sixhour
#SBATCH --time=6:00:00
#SBATCH --array=1-300

# Job array size will be number of lines in file divided by 
# number of lines chosen below

START=$SLURM_ARRAY_TASK_ID
NUMLINES=100
STOP=$((SLURM_ARRAY_TASK_ID*NUMLINES))
START="$(($STOP - $(($NUMLINES - 1))))"

echo "START=$START"
echo "STOP=$STOP"

for (( N = $START; N <= $STOP; N++ ))
do
    LINE=$(sed -n "$N"p File.txt)
    call-program-name-here $LINE
done