Quick Start Guide

This quick start guide will go through the minimum steps it takes to access the cluster, check resources available, create a submit script, and run a job on the cluster.

Access Request

If you have not yet submitted an access request for any cluster, you will need to do so.

Pre-Requisites

Linux Basics

Each cluster uses a Linux Operating System called CentOS. In this quick start guide, we do not cover the basics of Linux. Here are some external resources that you may find useful to provide you with such training.

Tutorials

There is a large number of Linux tutorials online including:

Connecting

Find detailed instructions on how to connect from major operating systems. For the purposes of this quick start guide, we will connect by SSH.

Storage

After successful login to the cluster, you will find yourself in the directory /home/$USER, where $USER is your KU Online ID. This is your home directory and serves as the repository for your personal files, and configurations. You can reference your home directory by ~ or $HOME.

Your home directory is located on a shared file system. Therefore, all files and directories are always available on all cluster nodes. Disk space is managed by quotas. By default, each user has 50GB of disk space available. Running the command crctool will you show how much space you have used and have left.

Other storage volumes located on the cluster are $WORK and $SCRATCH. Read the storage overview for more details.

Transfer Data

At some point, you will probably need to copy files between your local computer and the cluster. There are different ways to achieve this, depending on your local operating system (OS). Transferring data can be done via SCP, SFTP, or Globus

Software

Software on the cluster is controlled by modules. Unlike a traditional computer, software needs to be loaded in your environment before it is ready to use. This simplifies the management of environment variables associated with various software packages and also allows for multiple versions of the same software.

List of all available modules
```
module avail
```

To use the lastest python version, we will put the module load python command in a submit script.

Partitions

Each owner group is given a partition with the nodes for which that owner has purchased. You will have access to at least one owner group partition and the sixhour partition. To view all partitions you are eligible to submit jobs to, run crctool.

This job will be ran in the sixhour partition, but can easily be changed to an owner group partition.

#SBATCH --partition=sixhour

Submit Script

It's now time for your first job script. To do some work on the cluster, you require certain resources (e.g. CPUs and memory) and a description of the computations to be done. A job consists of instructions to the scheduler in the form of option flags, and statements that describe the actual tasks. Let's start with the instructions to the scheduler.

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --partition=sixhour
#SBATCH --mem-per-cpu=2GB

# Put your code below this line

The first line makes sure that the file is executed using the bash shell. The remaining lines are option flags used by the sbatch command. The page Jobs Submission outlines the most important options of sbatch.

Now, lets write a Hello World script.

# Put your code below this line
module load conda
mkdir $WORK/my_first_job
cd $WORK/my_first_job
python -c "print('Hello World')" > hello.txt

After loading the conda module, we create a new directory my_first_job within $WORK directory. The variable $WORK expands to /kuhpc/work// . Then, we change directory to the newly created directory. In the fourth line we print the line Hello World and redirect the output to a file named hello.txt. Save the contents of the script to a file named first.sh.

The complete job script looks like this:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --partition=sixhour
#SBATCH --mem-per-cpu=2GB

# Put your code below this line
module load conda
mkdir $WORK/my_first_job
cd $WORK/my_first_job
python -c "print('Hello World')" > hello.txt

Submitting Job

We can now submit our first job to the scheduler. The scheduler will then provide the requested resources to the job. If all requested resources are already available, then your job can start immediately. Otherwise your job will wait until enough resources are available. We submit our job to the scheduler using the sbatch command:

sbatch first.sh
Submitted batch job 32490640

If the job is submitted successfully, the command outputs a job-ID with which you can refer to your job later on.

Monitor Jobs

You can inspect the state of our active jobs (running or pending) with the squeue command:

squeue --job=32490640

      JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   32490640   sixhour    job01 testuser  R       0:22      1 r10r10n03

Here you can see that the with job-ID 32490640 is in state RUNNING (R). The job is running in partition sixhour on r10r10n03 for 22 seconds. It is also possible that the job can not start immediately after submitting it to SLURM because the requested resources are not yet available. In this case, the output could look like this:

squeue --job=32490640

       JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    32490640   sixhour    job01 testuser PD       0:00      1 (Priority)

Here you can see that the job is in state PENDING (PD) and a reason why the job is pending. In this example, the job has to wait for at least one other job with higher priority.

You can always list all your active (pending or running) jobs with squeue:

squeue --user=testuser

      JOBID   PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   34651451     sixhour slurm.sh  testuser PD       0:00      2 (Priority)
   34651453     sixhour slurm.sh  testuser PD       0:00      2 (Priority)
   29143227     sixhour     Rjob  testuser PD       0:00      4 (JobHeldUser)
   37856328     sixhour   mpi.sh  testuser  R       4:38      2 r11r12n[012-014]
   32634559     sixhour  fast.sh  testuser  R    2:52:37      1 r16r12n01
   32634558     sixhour  fast.sh  testuser  R    3:00:54      1 r14r14n03
   32634554     sixhour  fast.sh  testuser  R    4:11:26      1 r08r20n02
   32633556     sixhour  fast.sh  testuser  R    4:36:10      1 r08r20n03

Training

CRC regulary holds trainings for various topics. We are also available for one on one sessions or can present to a lab or group if wanted. crchelp@ku.edu is best way to contact us if you have any question regarding using the cluster.