Submitting Jobs

Maximum number of jobs

The maximum number of jobs a user can have submitted at one time is 5000

Batch jobs

To run a job in batch mode, use your favorite text editor to create a file which has SLURM options and also instructions on how to run your job, called a submission script. All SLURM options are prefaced with #SBATCH. It is necessary to specify the partition you wish to run in. After your script is complete, you can submit the job to the cluster with command sbatch.

A submission script is simply a text file that contains your job parameters and the commands you wish to execute as part of your job. You can also load modules, set environmental variables, or other tasks inside your submission script.

sbatch example.sh

You may also submit simple jobs from the command line

srun --partition=sixhour echo Hello World!

Command-line options

Command-line options will override SLURM options in your job script.

Interactive jobs

An interactive job allows you to open a shell on the compute node as if you had ssh'd into it. It is usually used for debugging purposes.

To submit an interactive job, use the srun. Again, you must specify which --partition you wish your job to run in.

srun --time=4:00:00 --ntasks=1 --nodes=1 --partition=sixhour --pty /bin/bash -l

In the example above, the job has requested:

--time=4:00:00 4 hours for the job run
--ntasks=1 1 task. By default, 1 core is given to each task.
--nodes=1 1 node
--partition=sixhour Job to run in sixhour partition
--pty /bin/bash Interactive terminal running /bin/bash shell.

The --time, --ntasks, --nodes are called options.

If you have ssh'd to the submit nodes with X11 forwarding enabled and wish to have X11 for an interactive job, then supply the --x11 flag

srun --time=4:00:00 --ntasks=4 --nodes=1 --partition=sixhour --x11 --pty /bin/bash -l

Default Options

If no SLURM options are given, default options are applied.

Submission Script

To run a job in batch mode on a high-performance computing system using SLURM, first prepare a job script that specifies the application you want to run and the resources required to run it, and then submit the script to SLURM using the sbatch command.

A very basic job script might contain just a bash or tcsh shell script. However, SLURM job scripts most commonly contain at least one executable command preceded by a list of options that specify resources and other attributes needed to execute the command (e.g., wall-clock time, the number of nodes and processors, and filenames for job output and errors). These options prefaced with the #SBATCH instruction, which should precede any executable lines in your job script.

Additionally, your SLURM job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.

HPC Examples

Check out the HPC Examples Gitlab repo

Tasks / Cores

Slurm is very explicit in how one requests cores and nodes. While extremely powerful, the three flags, --nodes, --ntasks, and --cpus-per-task can be a bit confusing at first.

The term task in this context can be thought of as a process. Therefore, a multi-process program (e.g. MPI) is comprised of multiple tasks. And a multi-threaded program is comprised of a single task, which can in turn use multiple CPUs. In SLURM, tasks are requested with the --ntasks flag. CPUs, for the multithreaded programs, are requested with the --cpus-per-task flag.

Examples

Single Core Job

The --mem option can be used to request the appropriate amount of memory for your job. Please make sure to test your application and set this value to a reasonable number based on actual memory use. The %j in the --output line tells SLURM to substitute the job ID in the name of the output file. You can also add --error with an error file name to separate output and error logs.

#!/bin/bash
#SBATCH --job-name=serial_job_test    # Job name
#SBATCH --partition=sixhour           # Partition Name (Required)
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ku.edu      # Where to send mail
#SBATCH --ntasks=1                    # Run on a single CPU
#SBATCH --mem=1g                     # Job memory request
#SBATCH --time=0-00:05:00             # Time limit days-hrs:min:sec
#SBATCH --output=serial_test_%j.log   # Standard output and error log

pwd; hostname; date

module load python/3.6

echo "Running python script"

python /path/to/your/python/script/script.py

date

Threaded or multi-core job

This script can serve as template for applications that are capable of using multiple processors on a single server or physical computer. These applications are commonly referred to as threaded, OpenMP, PTHREADS, or shared memory applications. While they can use multiple processors, they cannot make use of multiple servers and all the processors must be on the same node.

These applications required shared memory and can only run on one node; as such it is important to remember the following:

You must set --ntasks=1, and then set --cpus-per-task to the number of threads you wish to use.

You must make the application aware of how many processors to use. How that is done depends on the application:

For some applications, set OMP_NUM_THREADS to a value less than or equal to the number of --cpus-per-task you set.

For some applications, use a command line option when calling that application.

#!/bin/bash
#SBATCH --job-name=parallel_job      # Job name
#SBATCH --partition=sixhour          # Partition Name (Required)
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ku.edu     # Where to send mail
#SBATCH --ntasks=1                   # Run a single task
#SBATCH --cpus-per-task=4            # Number of CPU cores per task
#SBATCH --mem-per-cpu=2g             # Job memory request
#SBATCH --time=0-00:05:00            # Time limit days-hrs:min:sec
#SBATCH --output=parallel_%j.log     # Standard output and error log

pwd; hostname; date

echo "Running on $SLURM_CPUS_PER_TASK cores"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

module load StdEnv

/path/to/your/program

MPI job

These are applications that can use multiple processors that may, or may not, be on multiple compute nodes. In SLURM, the --ntasks flag specifies the number of MPI tasks created for your job. Note that, even within the same job, multiple tasks do not necessarily run on a single node. Therefore, requesting the same number of CPUs as above, but with the --ntasks flag, could result in those CPUs being allocated on several, distinct compute nodes.

For many users, differentiating between --ntasks and --cpus-per-task is sufficient. However, for more control over how SLURM lays out your job, you can add the --nodes and --ntasks-per-node flags. --nodes specifies how many nodes to allocate to your job. SLURM will allocate your requested number of cores to a minimal number of nodes on the cluster, so it is extremely likely if you request a small number of tasks that they will all be allocated on the same node. However, to ensure they are on the same node, set --nodes=1 (obviously this is contingent on the number of CPUs and requesting too many may result in a job that will never run). Conversely, if you would like to ensure a specific layout, such as one task per node for memory, I/O or other reasons, you can also set --ntasks-per-node=1. Note that the following must be true:

ntasks-per-node * nodes >= ntasks

The job below requests 16 tasks per node, with 2 nodes. By default, each task gets 1 core, so this job uses 32 cores. If the --ntasks=16 option was used, it would only use 16 cores and could be on any of the nodes in the partition, even split between multiple nodes.

#!/bin/bash
#SBATCH --partition=sixhour      # Partition Name (Required)
#SBATCH --ntasks-per-node=16     # 16 tasks per node with each task given 1 core
#SBATCH --nodes=2                # Run across 2 nodes
#SBATCH --constraint=ib          # Only nodes with Infiniband (ib)
#SBATCH --mem-per-cpu=4g       # Job memory request
#SBATCH --time=0-06:00:00        # Time limit days-hrs:min:sec
#SBATCH --output=mpi_%j.log      # Standard output and error log

echo "Running on $SLURM_JOB_NODELIST nodes using $SLURM_CPUS_ON_NODE cores on each node"

mpirun /path/to/program

GPU

GPU nodes can be requested using the general consumable resource option (--gres=gpu). There are 5 different types of GPU cards in the KU Community Cluster. If you want to run on a specific GPU, you have to specify the type of GPU. To run on a V100 GPU:

--gres=gpu:v100:1

Multiple GPUs

You may request multiple GPUs by changing the --gres value to --gres=gpu:2. Note that this value is per node. For example, --nodes=2 --gres=gpu:2 will request 2 nodes with 2 GPUs each, for a total of 4 GPUs.

Single/Double Precision

By default, your job will run on all GPUs in the cluster if using the sixhour partition. This includes GPUs that are only single precision capable. If you need double precision GPUs only, use --constraint=double

The job below request a single GPU node in the sixhour partition

#!/bin/bash
#SBATCH --partition=sixhour   # Partition Name (Required)
#SBATCH --ntasks=1            # 1 task
#SBATCH --time=0-06:00:00     # Time limit days-hrs:min:sec
#SBATCH --gres=gpu            # 1 GPU
#SBATCH --output=gpu_%j.log   # Standard output and error log
module load singularity
CONTAINERS=kuhpc/software/install/singularity/containers
singularity exec --nv $CONTAINERS/tensorflow-gpu-1.9.0.img python ./models/tutorials/image/mnist/convolutional.py