HPC@LSU | Documentation | Software

blast

Table of Content

Version and Availability
About
Usage
Resources

Versions and Availability

Module Names for blast on qb2

Machine	Version	Module Name
qb2	2.11.0	blast/2.11.0/gcc-9.3.0

▶ Module FAQ?

The information here is applicable to LSU HPC and LONI systems.

Shells

A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.

/bin/bash

System resource file: /etc/profile

When one access the shell, the following user files are read in if they exist (in order):

~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
~/.bashrc (interactive login only)
~/.profile

When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.

The default value of the environmental variable, PATH, is set automatically using Modules. See below for more information.

/bin/tcsh

The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.

Modules

Modules is a utility which helps users manage the complex business of setting up their shell environment in the face of potentially conflicting application versions and libraries.

Default Setup

When a user logs in, the system looks for a file named .modules in their home directory. This file contains module commands to set up the initial shell environment.

Viewing Available Modules

The command

$ module avail

displays a list of all the modules available. The list will look something like:

--- some stuff deleted ---
velvet/1.2.10/INTEL-14.0.2
vmatch/2.2.2

---------------- /usr/local/packages/Modules/modulefiles/admin -----------------
EasyBuild/1.11.1       GCC/4.9.0              INTEL-140-MPICH/3.1.1
EasyBuild/1.13.0       INTEL/14.0.2           INTEL-140-MVAPICH2/2.0
--- some stuff deleted ---

The module names take the form appname/version/compiler, providing the application name, the version, and information about how it was compiled (if needed).

Managing Modules

Besides avail, there are other basic module commands to use for manipulating the environment. These include:

add/load mod1 mod2 ... modn . . . Add modules
rm/unload mod1 mod2 ... modn  . . Remove modules
switch/swap mod . . . . . . . . . Switch or swap one module for another
display/show  . . . . . . . . . . List modules loaded in the environment
avail . . . . . . . . . . . . . . List available module names
whatis mod1 mod2 ... modn . . . . Describe listed modules

The -h option to module will list all available commands.

▶ Did not find the version you want to use??

If a software package you would like to use for your research is not available on a cluster, you can request it to be installed. The software requests are evaluated by the HPC staff on a case-by-case basis. Before you send in a software request, please go through the information below.

Types of request

Depending on how many users need to use the software, software requests are divided into three types, each of which corresponds to the location where the software is installed:

The user's home directory

Software packages installed here will be accessible only to the user.
It is suitable for software packages that will be used by a single user.
Python, Perl and R modules should be installed here.

/project

Software packages installed in /project can be accessed by a group of users.
It is suitable for software packages that
- Need to be shared by users from the same research group, or
- are bigger than the quota on the home file syste.
This type of request must be sent by the PI of the research group, who may be asked to apply for a storage allocation.

/usr/local/packages

Software packages installed under /usr/local/packages can be accessed by all users.
It is suitable for software packages that will be used by users from multiple research groups.
This type of request must be sent by the PI of a research group.

How to request

Please send an email to sys-help@loni.org with the following information:

Your user name
The name of cluster where you want to use the requested software
The name, version and download link of the software
Specific installation instructions if any (e.g. compiler flags, variants and flavor, etc.)
Why the software is needed
Where the software should be installed (locally, /project, or /usr/local/packages) and justification explaining how many users are expected.

Please note that, once the software is installed, testing and validation are users' responsibility.

About the Software

Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. - Homepage: http://blast.ncbi.nlm.nih.gov/

Usage

A suite of tools are provided in the BLAST+, such as blastx, blastn, blastp. The command line below is provided by NCBI, which is a blastn search against a database. This command:

    blastn -db nt -qurey nt.fsa -out results.out

will run a blastn search of nt.fsa (a nucleotide sequence in FASTA format) against the nt database, printing results to the file results.out.

BLAST+ uses an environment variable $BLASTDB to point to the directory where the database is located in. So this environment variable should be specified before running blast commands, see example below:

Example: run blastx search in PBS batch job

This example shows a blastx search of trinity.fasta against the nr database, printing results to the file blastx.out. The nr database files are in the directory /work/ychen64/nr.

#!/bin/bash
#PBS -q workq
#PBS -l nodes=1:ppn=20
#PBS -l walltime=72:00:00
#PBS -A your_allocation_name
export BLASTDB=/work/ychen64/nr
blastx -query trinity.fasta -db nr -out blastx.out -num_threads 20

The option -num_threads is used to specify the number of threads (CPUs) in the BLAST search. This number should match the number in "ppn=" in the "#PBS -l nodes=1:ppn=" to fully utilize the CPU power.

Example: create local BLAST database in PBS batch job

The local BLAST database can be created by a perl script called "update_blastdb.pl" included as part of BLAST+. perl is required to run this script. In this example, a nr database is created at /work/ychen64

#!/bin/bash
#PBS -q workq
#PBS -l nodes=1:ppn=20
#PBS -l walltime=72:00:00
#PBS -A your_allocation_name
cd /work/ychen64
# Specify the database name here:
export DATABASE=nr

mkdir $DATABASE
cd $DATABASE
update_blastdb.pl --decompress --verbose $DATABASE

It will take a while to create a large local BLAST database, so update_blastdb.pl should be run with the interactive or batch job.

Once the local BLAST database is created, it needs to be updated in a timely manner (every couple of days/weeks months based on your database). Just run the script above again to update the database. The Documentation for the update_blastdb.pl script is available by running the script without any arguments.

Note:

Please don't search against the database on the NCBI BLAST server (i.e. using "-remote" option), especially for the large case. Search against local BLAST database only.
Please only use one compute node to run BLAST job. The parallel search technique provided by BLAST+ is based on the thread parallelism , so it cannot be used on the multiple nodes.
On the cluster with SLURM job scheduler (rather than PBS), use #SBATCH -c to specify the number of threads per process in the SLURM directives. Skip #SBATCH -n in the SLURM directives.

Resources

The BLAST Home Page provides links to protein and genomic data sets, as well as information on specific tools.

Last modified: September 10 2020 17:18:38.

High Performance Computing

Louisiana State University