Alphafold
Info
Based off work provided by Research Computing at University of Virginia
AlphaFold launch command
Please refer to run_alphafold.py for all available options.
Launch script run
For your convenience, we have prepared a launch script run
that takes care of the Singularity command and the database paths, since these are unlikely to change. If you do need to customize anything please use the full Singularity command.
#!/bin/bash
if [ -z "$ALPHAFOLD_DATA_PATH" ]; then
echo "\$ALPHAFOLD_DATA_PATH variable not set. Setting to default path:"
echo "/panfs/pfs.local/scratch/all/db"
echo ""
export ALPHAFOLD_DATA_PATH=/panfs/pfs.local/scratch/all/db
fi
singularity run -B $(realpath $ALPHAFOLD_DATA_PATH):/data \
-B .:/etc \
--pwd /app/alphafold \
--nv $CONTAINERDIR/alphafold-${EBVERSIONALPHAFOLD}.sif \
--data_dir=/data \
--uniref90_database_path=/data/uniref90/uniref90.fasta \
--mgnify_database_path=/data/mgnify/mgy_clusters.fa \
--template_mmcif_dir=/data/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat \
"$@"
Explanation of Singularity flags
- The database and models are stored in
$ALPHAFOLD_DATA_PATH
. - A cache file
ld.so.cache
will be written to/etc
, which is not allowed on the cluster. The workaround is to bind-mount e.g. the current working directory to/etc
inside the container.[-B .:/etc]
- You must launch AlphaFold from
/app/alphafold
inside the container due to this issue.[--pwd /app/alphafold]
- The
--nv
flag enables GPU support.
Explanation of AlphaFold flags
- The default command of the container is at
/app/run_alphafold.sh
. - As a consequence of the Singularity
--pwd
flag, the fasta and output paths must be full paths (e.g. /home/$USER/mydir, not relative paths (e.g. ./mydir ). You may use$PWD
as demonstrated. - The
max_template_date
is of the formYYYY-MM-DD
. - Only the database paths in
mark_flags_as_required
of run_alphafold.py are included because the optional paths depend ondb_preset
(full_dbs
orreduced_dbs
) andmodel_preset
.
SLURM SCRIPT
Below are some templates for your Slurm script
Monomer with full_dbs
#!/bin/bash
#SBATCH --partition=sixhour # partition
#SBATCH --gres=gpu:1 # number of GPUs
#SBATCH --nodes=1 # number of nodes
#SBATCH --cpus-per-task=8 # number of cores
#SBATCH --mem=40g # memory
#SBATCH --time=6:00:00 # time
module purge
module load singularity alphafold
run --fasta_paths=$PWD/your_fasta_file \
--output_dir=$PWD/outdir \
--model_preset=monomer \
--db_preset=full_dbs \
--bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--pdb70_database_path=/data/pdb70/pdb70 \
--uniclust30_database_path=/data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--max_template_date=YYYY-MM-DD \
--use_gpu_relax=True
Multimer with reduced_dbs
#!/bin/bash
#SBATCH --partition=sixhour # partition
#SBATCH --gres=gpu:1 # number of GPUs
#SBATCH --nodes=1 # number of nodes
#SBATCH --cpus-per-task=8 # number of cores
#SBATCH --mem=40g # memory
#SBATCH --time=6:00:00 # time
module purge
module load singularity alphafold
run --fasta_paths=$PWD/your_fasta_file \
--output_dir=$PWD/outdir \
--model_preset=multimer \
--db_preset=reduced_dbs \
--pdb_seqres_database_path=/data/pdb_seqres/pdb_seqres.txt \
--uniprot_database_path=/data/uniprot/uniprot.fasta \
--small_bfd_database_path=/data/small_bfd/bfd-first_non_consensus_sequences.fasta \
--max_template_date=YYYY-MM-DD \
--use_gpu_relax=True
Notes
- You may need to request 8 CPU cores due to this line printed in the output:
Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpys2ocad8/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./seq.fasta /share/resources/data/alphafold/mgnify/mgy_clusters.fa"
- You must provide a value for
--max_template_date
. See https://github.com/deepmind/alphafold/blob/main/run_alphafold.py#L92-L934. - The flag
--use_gpu_relax
is only for version 2.1.2 and above. - You are not required to use the
run
wrapper script. You can always provide the full singularity command.