Measuring Memory and CPU Usage

Making sure your jobs use the right amount of RAM and the right number of CPUs helps you and others using the clusters use these resources more effeciently, and in turn get work done more quickly. Below are some examples of how to measure your CPU and RAM usage so you can make this happen. Be sure to check the example SLURM submission scripts to request the correct number of resources.

CPU Percentage Used

By default, this is percentage of a single CPU. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, top will show a CPU use of 180%.

Future Jobs

If you launch a program by putting /usr/bin/time -v in front of it, time will watch your program and provide statistics about the resources it used. Check Percent of CPU this job got: for how much CPU was used. Check Maximum resident set size (kbytes) for how much RAM the job used. For example:

/usr/bin/time -v stress -c 8 -t 10s
stress: info: \[17958\] dispatching hogs: 8 cpu, 0 io, 0 vm, 0 hdd
stress: info: \[17958\] successful run completed in 10s
        Command being timed: "stress -c 8 -t 10s"
        Percent of CPU this job got: 796%
        Maximum resident set size (kbytes): 2368

Running Jobs

If your job is already running, you can check on its usage, but will have to wait until it has finished to find the maximum memory and CPU used. The easiest way to check the instantaneous memory and CPU usage of a job is to ssh to a compute node your job is running on. To find the node you should ssh to, run:

squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1654654   sixhour   abc123 r557e636  R       0:24      1 n259

Then use ssh to connect to a node your job is running on from the NODELIST column:

ssh n259

SSH to compute node**

To access a compute node via ssh, you must have a job running on that compute node. Your ssh session will be bound by the same cpu, memory, and time your job requested.

Once you are on the compute node, run either pstop.

ps

ps will give you instantaneous usage every time you run it. Here is some sample ps output:

ps -u $USER -o %cpu,rss,args
%CPU   RSS COMMAND
 0.0   588 stress -c 5 -t 10000s
98.2   204 stress -c 5 -t 10000s
98.2   204 stress -c 5 -t 10000s
98.2   204 stress -c 5 -t 10000s
98.2   204 stress -c 5 -t 10000s
98.2   204 stress -c 5 -t 10000s

ps reports memory used in kilobytes, so each of the 5 stress processes is using 204KB of RAM. They are also using most of 5 cores, so future jobs like this should request 5 CPUs.

top

top runs interactively and shows you live usage statistics. You can press u, enter your KU Online ID, then enter to filter just your processes. For Memory usage, the number you are interested in is RES. In the case below, the igblastn and perl programs are each consuming from 46MB to 348MB of memory and each fully utilizing one CPU. You can press q to quit.

top - 23:29:16 up 112 days,  1:00,  1 user,  load average: 5.17, 5.16, 5.15
Tasks: 647 total,   6 running, 641 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.5%us,  1.1%sy,  0.0%ni, 73.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   125.989G total,  122.164G used, 3917.367M free,  388.625M buffers
Swap:    0.000k total,    0.000k used,    0.000k free,  118.752G cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16273 r557e636  20   0 96068  48m 5812 R 100.0  0.0 250:31.93 igblastn
16167 r557e636  20   0  316m 196m 1252 R 100.0  0.2   0:45.35 perl
16309 r557e636  20   0  468m 348m 1376 R 100.0  0.3  59:57.89 perl
16384 r557e636  20   0 94256  46m 5836 R 100.0  0.0 248:26.95 igblastn
16214 r557e636  20   0  194m  74m 1252 R 99.7  0.1   0:16.94 perl

Completed Jobs

Slurm records statistics for every job, including how much memory and CPU was used.

seff

After the job completes, you can run seff jobid to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to.

seff 1620511
Job ID: 1620511
Cluster: ku\_community\_cluster
User/Group: r557e636/r557e636\_g
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 8-19:03:16
CPU Efficiency: 99.87% of 8-19:19:34 core-walltime
Job Wall-clock time: 8-19:19:34
Memory Utilized: 66.96 MB
Memory Efficiency: 0.82% of 8.00 GB

The job above requested 1 core and 8GB of memory. It utilized the 1 core with 99.87% efficiently, but only used .82% of the 8GB of memory requested. Future jobs can probably be requested with less memory if the input data is the same.

Note

If your job requests email to be sent for END or FAIL mail types, the seff information about that job will be sent in the body of the email.

sacct

You can also use the more flexible sacct to get that info, along with other more advanced job queries.

sacct -j 1620511 -o "JobID%20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS,AllocTRES%32"
               JobID    JobName      User  Partition        NodeList    Elapsed      State ExitCode     MaxRSS                        AllocTRES
-------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- --------------------------------
             1620511 paper2tes+    r557e636  biostat            n146 8-19:19:34  COMPLETED      0:0               billing=1,cpu=1,mem=8G,node=1
       1620511.batch      batch                                 n146 8-19:19:34  COMPLETED      0:0     68572K              cpu=1,mem=8G,node=1
      1620511.extern     extern                                 n146 8-19:19:34  COMPLETED      0:0       616K    billing=1,cpu=1,mem=8G,node=1