Measuring Memory and CPU Usage
Making sure your jobs use the right amount of RAM and the right number of CPUs helps you and others using the clusters use these resources more effeciently, and in turn get work done more quickly. Below are some examples of how to measure your CPU and RAM usage so you can make this happen. Be sure to check the example SLURM submission scripts to request the correct number of resources.
CPU Percentage Used
By default, this is percentage of a single CPU. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, top will show a CPU use of 180%.
Future Jobs
If you launch a program by putting /usr/bin/time -v
in front of it, time
will watch your program and provide statistics about the resources it used. Check Percent of CPU this job got:
for how much CPU was used. Check Maximum resident set size (kbytes)
for how much RAM the job used. For example:
/usr/bin/time -v stress -c 8 -t 10s
stress: info: \[17958\] dispatching hogs: 8 cpu, 0 io, 0 vm, 0 hdd
stress: info: \[17958\] successful run completed in 10s
Command being timed: "stress -c 8 -t 10s"
Percent of CPU this job got: 796%
Maximum resident set size (kbytes): 2368
Running Jobs
If your job is already running, you can check on its usage, but will have to wait until it has finished to find the maximum memory and CPU used. The easiest way to check the instantaneous memory and CPU usage of a job is to ssh to a compute node your job is running on. To find the node you should ssh to, run:
squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1654654 sixhour abc123 r557e636 R 0:24 1 n259
Then use ssh to connect to a node your job is running on from the NODELIST column:
ssh n259
SSH to compute node**
To access a compute node via ssh, you must have a job running on that compute node. Your ssh session will be bound by the same cpu, memory, and time your job requested.
Once you are on the compute node, run either ps
top.
ps
ps
will give you instantaneous usage every time you run it. Here is some sample ps output:
ps -u $USER -o %cpu,rss,args
%CPU RSS COMMAND
0.0 588 stress -c 5 -t 10000s
98.2 204 stress -c 5 -t 10000s
98.2 204 stress -c 5 -t 10000s
98.2 204 stress -c 5 -t 10000s
98.2 204 stress -c 5 -t 10000s
98.2 204 stress -c 5 -t 10000s
ps
reports memory used in kilobytes, so each of the 5 stress processes is using 204KB of RAM. They are also using most of 5 cores, so future jobs like this should request 5 CPUs.
top
top
runs interactively and shows you live usage statistics. You can press u, enter your KU Online ID, then enter to filter just your processes. For Memory usage, the number you are interested in is RES. In the case below, the igblastn and perl programs are each consuming from 46MB to 348MB of memory and each fully utilizing one CPU. You can press q to quit.
top - 23:29:16 up 112 days, 1:00, 1 user, load average: 5.17, 5.16, 5.15
Tasks: 647 total, 6 running, 641 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.5%us, 1.1%sy, 0.0%ni, 73.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 125.989G total, 122.164G used, 3917.367M free, 388.625M buffers
Swap: 0.000k total, 0.000k used, 0.000k free, 118.752G cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16273 r557e636 20 0 96068 48m 5812 R 100.0 0.0 250:31.93 igblastn
16167 r557e636 20 0 316m 196m 1252 R 100.0 0.2 0:45.35 perl
16309 r557e636 20 0 468m 348m 1376 R 100.0 0.3 59:57.89 perl
16384 r557e636 20 0 94256 46m 5836 R 100.0 0.0 248:26.95 igblastn
16214 r557e636 20 0 194m 74m 1252 R 99.7 0.1 0:16.94 perl
Completed Jobs
Slurm records statistics for every job, including how much memory and CPU was used.
seff
After the job completes, you can run seff jobid
to get some useful information about your job, including the memory used and what percent of your allocated memory that amounts to.
seff 1620511
Job ID: 1620511
Cluster: ku\_community\_cluster
User/Group: r557e636/r557e636\_g
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 8-19:03:16
CPU Efficiency: 99.87% of 8-19:19:34 core-walltime
Job Wall-clock time: 8-19:19:34
Memory Utilized: 66.96 MB
Memory Efficiency: 0.82% of 8.00 GB
The job above requested 1 core and 8GB of memory. It utilized the 1 core with 99.87% efficiently, but only used .82% of the 8GB of memory requested. Future jobs can probably be requested with less memory if the input data is the same.
Note
If your job requests email to be sent for END or FAIL mail types, the seff information about that job will be sent in the body of the email.
sacct
You can also use the more flexible sacct
to get that info, along with other more advanced job queries.
sacct -j 1620511 -o "JobID%20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS,AllocTRES%32"
JobID JobName User Partition NodeList Elapsed State ExitCode MaxRSS AllocTRES
-------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- --------------------------------
1620511 paper2tes+ r557e636 biostat n146 8-19:19:34 COMPLETED 0:0 billing=1,cpu=1,mem=8G,node=1
1620511.batch batch n146 8-19:19:34 COMPLETED 0:0 68572K cpu=1,mem=8G,node=1
1620511.extern extern n146 8-19:19:34 COMPLETED 0:0 616K billing=1,cpu=1,mem=8G,node=1