Submitting Jobs
Maximum number of jobs
The maximum number of jobs a user can have submitted at one time is 5000
Batch jobs
To run a job in batch mode, use your favorite text editor to create a file which has SLURM options and also instructions on how to run your job, called a submission script. All SLURM options are prefaced with #SBATCH
. It is necessary to specify the partition you wish to run in. After your script is complete, you can submit the job to the cluster with command sbatch
.
A submission script is simply a text file that contains your job parameters and the commands you wish to execute as part of your job. You can also load modules, set environmental variables, or other tasks inside your submission script.
sbatch example.sh
You may also submit simple jobs from the command line
srun --partition=sixhour echo Hello World!
Command-line options
Command-line options will override SLURM options in your job script.
Interactive jobs
An interactive job allows you to open a shell on the compute node as if you had ssh'd into it. It is usually used for debugging purposes.
To submit an interactive job, use the srun
. Again, you must specify which --partition
you wish your job to run in.
srun --time=4:00:00 --ntasks=1 --nodes=1 --partition=sixhour --pty /bin/bash -l
--time=4:00:00
4 hours for the job run--ntasks=1
1 task. By default, 1 core is given to each task.--nodes=1
1 node--partition=sixhour
Job to run in sixhour partition--pty /bin/bash
Interactive terminal running /bin/bash shell.
The --time, --ntasks, --nodes
are called options.
If you have ssh'd to the submit nodes with X11 forwarding enabled and wish to have X11 for an interactive job, then supply the --x11
flag
srun --time=4:00:00 --ntasks=4 --nodes=1 --partition=sixhour --x11 --pty /bin/bash -l
Default Options
If no SLURM options are given, default options are applied.
Submission Script
To run a job in batch mode on a high-performance computing system using SLURM, first prepare a job script that specifies the application you want to run and the resources required to run it, and then submit the script to SLURM using the sbatch
command.
A very basic job script might contain just a bash
or tcsh
shell script. However, SLURM job scripts most commonly contain at least one executable command preceded by a list of options that specify resources and other attributes needed to execute the command (e.g., wall-clock time, the number of nodes and processors, and filenames for job output and errors). These options prefaced with the #SBATCH instruction, which should precede any executable lines in your job script.
Additionally, your SLURM job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.
HPC Examples
Check out the HPC Examples Gitlab repo
Tasks / Cores
Slurm is very explicit in how one requests cores and nodes. While extremely powerful, the three flags, --nodes
, --ntasks
, and --cpus-per-task
can be a bit confusing at first.
The term task in this context can be thought of as a process. Therefore, a multi-process program (e.g. MPI) is comprised of multiple tasks. And a multi-threaded program is comprised of a single task, which can in turn use multiple CPUs. In SLURM, tasks are requested with the --ntasks
flag. CPUs, for the multithreaded programs, are requested with the --cpus-per-task
flag.
Examples
Single Core Job
The --mem
option can be used to request the appropriate amount of memory for your job. Please make sure to test your application and set this value to a reasonable number based on actual memory use. The %j
in the --output
line tells SLURM to substitute the job ID in the name of the output file. You can also add --error
with an error file name to separate output and error logs.
#!/bin/bash
#SBATCH --job-name=serial_job_test # Job name
#SBATCH --partition=sixhour # Partition Name (Required)
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ku.edu # Where to send mail
#SBATCH --ntasks=1 # Run on a single CPU
#SBATCH --mem=1g # Job memory request
#SBATCH --time=0-00:05:00 # Time limit days-hrs:min:sec
#SBATCH --output=serial_test_%j.log # Standard output and error log
pwd; hostname; date
module load python/3.6
echo "Running python script"
python /path/to/your/python/script/script.py
date
Threaded or multi-core job
This script can serve as template for applications that are capable of using multiple processors on a single server or physical computer. These applications are commonly referred to as threaded, OpenMP, PTHREADS, or shared memory applications. While they can use multiple processors, they cannot make use of multiple servers and all the processors must be on the same node.
These applications required shared memory and can only run on one node; as such it is important to remember the following:
- You must set
--ntasks=1
, and then set--cpus-per-task
to the number of threads you wish to use. - You must make the application aware of how many processors to use. How that is done depends on the application:
- For some applications, set OMP_NUM_THREADS to a value less than or equal to the number of
--cpus-per-task
you set. - For some applications, use a command line option when calling that application.
#!/bin/bash #SBATCH --job-name=parallel_job # Job name #SBATCH --partition=sixhour # Partition Name (Required) #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=email@ku.edu # Where to send mail #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --mem-per-cpu=2g # Job memory request #SBATCH --time=0-00:05:00 # Time limit days-hrs:min:sec #SBATCH --output=parallel_%j.log # Standard output and error log pwd; hostname; date echo "Running on $SLURM_CPUS_PER_TASK cores" export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK module load StdEnv /path/to/your/program
- For some applications, set OMP_NUM_THREADS to a value less than or equal to the number of
MPI job
These are applications that can use multiple processors that may, or may not, be on multiple compute nodes. In SLURM, the --ntasks
flag specifies the number of MPI tasks created for your job. Note that, even within the same job, multiple tasks do not necessarily run on a single node. Therefore, requesting the same number of CPUs as above, but with the --ntasks
flag, could result in those CPUs being allocated on several, distinct compute nodes.
For many users, differentiating between --ntasks
and --cpus-per-task
is sufficient. However, for more control over how SLURM lays out your job, you can add the --nodes
and --ntasks-per-node
flags. --nodes
specifies how many nodes to allocate to your job. SLURM will allocate your requested number of cores to a minimal number of nodes on the cluster, so it is extremely likely if you request a small number of tasks that they will all be allocated on the same node. However, to ensure they are on the same node, set --nodes=1
(obviously this is contingent on the number of CPUs and requesting too many may result in a job that will never run). Conversely, if you would like to ensure a specific layout, such as one task per node for memory, I/O or other reasons, you can also set --ntasks-per-node=1
. Note that the following must be true:
ntasks-per-node * nodes >= ntasks
The job below requests 16 tasks per node, with 2 nodes. By default, each task gets 1 core, so this job uses 32 cores. If the --ntasks=16
option was used, it would only use 16 cores and could be on any of the nodes in the partition, even split between multiple nodes.
#!/bin/bash
#SBATCH --partition=sixhour # Partition Name (Required)
#SBATCH --ntasks-per-node=16 # 16 tasks per node with each task given 1 core
#SBATCH --nodes=2 # Run across 2 nodes
#SBATCH --constraint=ib # Only nodes with Infiniband (ib)
#SBATCH --mem-per-cpu=4g # Job memory request
#SBATCH --time=0-06:00:00 # Time limit days-hrs:min:sec
#SBATCH --output=mpi_%j.log # Standard output and error log
echo "Running on $SLURM_JOB_NODELIST nodes using $SLURM_CPUS_ON_NODE cores on each node"
mpirun /path/to/program
GPU
GPU nodes can be requested using the general consumable resource option (--gres=gpu
). There are 5 different types of GPU cards in the KU Community Cluster set up as features. To run on a V100 GPU:
--gres=gpu --constraint=v100
Multiple GPUs
You may request multiple GPUs by changing the --gres
value to --gres=gpu:2
. Note that this value is per node.
For example, --nodes=2
--gres=gpu:2
will request 2 nodes with 2 GPUs each, for a total of 4 GPUs.
Single/Double Precision
By default, your job will run on all GPUs in the cluster if using the sixhour partition. This includes GPUs that are only single
precision capable. If you need double precision GPUs only, use --constraint=double
The job below request a single GPU node in the sixhour partition
#!/bin/bash
#SBATCH --partition=sixhour # Partition Name (Required)
#SBATCH --ntasks=1 # 1 task
#SBATCH --time=0-06:00:00 # Time limit days-hrs:min:sec
#SBATCH --gres=gpu # 1 GPU
#SBATCH --output=gpu_%j.log # Standard output and error log
module load singularity
CONTAINERS=/panfs/pfs.local/software/install/singularity/containers
singularity exec --nv $CONTAINERS/tensorflow-gpu-1.9.0.img python ./models/tutorials/image/mnist/convolutional.py