R/RStudio
R GNU , a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.
RStudio is an IDE (Integrated Development Environment) for R.
HPC Examples
Check out the HPC Examples Gitlab repo which includes all scripts below.
Choosing a R version
Run the command below to see the available R versions.
module spider R
-----------------------------------
R:
-----------------------------------
Versions:
R/3.5
R/3.6
R/4.0
R/4.2
R/4.3
RStudio GUI via Command Line
Slow
The RStudio GUI can be slow on the cluster. CRC recommends running RStudio on your desktop and running R via command line on the cluster.
First, you will need to connect to the KU Community Cluster or Hawk cluster via X11 or X2Go.
After connecting above you will connected to the submit nodes. We don't suggest launching RStudio on the submit nodes as any process will get killed automatically after 60 minutes of CPU time. You want to submit an interactive job. This will connect you to a compute node.
Once on the compute node, you can then launch RStudio with the following command:
$ module load R/4.2
$ module load rstudio
$ rstudio
R
You have to load a R module before launching RStudio
Installing R Packages
Storage Quota
Make sure you have enough space for the size and file count for R packages using the crctool
After connecting to the cluster, load the R module of the version you wish to use. You will prompted if you wish to use a personal library and if you want to create it. You will only be prompted on the first install of package. This can't be done if you submit a job via sbatch.
$ module load R/4.2
$ R
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> install.packages("bitops")
Warning in install.packages("bitops") :
'lib = "/panfs/pfs.local/software/7/install/R/4.2.1/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
'/panfs/pfs.local/home/r557e636/R/x86_64-pc-linux-gnu-library/4.2'
to install packages into? (yes/No/cancel) yes
Different R Versions
A new personal library will be created for each minor version of R (4.0, 4.1, 4.2, etc...). Version 4.0 packages do not work in version 4.2
Install Packages to a Different Location
R packages can be installed anywhere. This would be a benefit if you wish to share a R environment you created with another user. You could install your R packages in your $WORK directory and thus anyone in that owner group could also use those packages. By issuing the command below before you launch R
, R will then look in those locations for packages.
Set an alternate personal library:
export R_LIBS_USER=/path/to/location
Set shared library path:
export R_LIBS_SITE=/path/to/location
R searches for packages in this priority: personal library, R_LIBS_USER, and then R_LIBS_SITE.
You can view the paths R is looking for packages by running the command below within R:
> .libPaths()
Running R on a Compute Node via sbatch
You will need a R script which contains your R code and also a submit script which you will submit to the cluster to run on a compute node.
x <- rnorm(50)
cat("My sample from N(0,1) is:\n")
print(x)
This is a serial job script to submit the hello_world.R to the cluster.
#!/bin/bash
#SBATCH --job-name=r_serial # Job name
#SBATCH --partition=sixhour # Partition Name (Required)
#SBATCH --ntasks=1 # Total number of tasks
#SBATCH --cpus-per-task=1 # Number of cores per task
#SBATCH --mem=2gb # Job memory request
#SBATCH --time=0-00:05:00 # Time limit hrs:min:sec
module load R/4.2
Rscript serial_test.R
Running Script in Batch Mode
Rscript {filename}
: by default, output will be printed to the standard output (which will end up in your .log file); you can use > outputfile to redirect the output where you want. Rscript does not load the methods package by default, which may occasionally surprise you - if your script directly or indirectly uses stuff from methods you need to load it explicitly with library("methods")
R CMD BATCH {filename}
: this is similar to Rscript but automatically sends output to