HPC@LSU | Documentation | Software

blast

Table of Content

Version and Availability
About
Usage
Resources

Versions and Availability

About the Software

Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. - Homepage: http://blast.ncbi.nlm.nih.gov/

Usage

A suite of tools are provided in the BLAST+, such as blastx, blastn, blastp. The command line below is provided by NCBI, which is a blastn search against a database. This command:

    blastn -db nt -qurey nt.fsa -out results.out

will run a blastn search of nt.fsa (a nucleotide sequence in FASTA format) against the nt database, printing results to the file results.out.

BLAST+ uses an environment variable $BLASTDB to point to the directory where the database is located in. So this environment variable should be specified before running blast commands, see example below:

Example: run blastx search in PBS batch job

This example shows a blastx search of trinity.fasta against the nr database, printing results to the file blastx.out. The nr database files are in the directory /work/ychen64/nr.

#!/bin/bash
#PBS -q workq
#PBS -l nodes=1:ppn=20
#PBS -l walltime=72:00:00
#PBS -A your_allocation_name
export BLASTDB=/work/ychen64/nr
blastx -query trinity.fasta -db nr -out blastx.out -num_threads 20

The option -num_threads is used to specify the number of threads (CPUs) in the BLAST search. This number should match the number in "ppn=" in the "#PBS -l nodes=1:ppn=" to fully utilize the CPU power.

Example: create local BLAST database in PBS batch job

The local BLAST database can be created by a perl script called "update_blastdb.pl" included as part of BLAST+. perl is required to run this script. In this example, a nr database is created at /work/ychen64

#!/bin/bash
#PBS -q workq
#PBS -l nodes=1:ppn=20
#PBS -l walltime=72:00:00
#PBS -A your_allocation_name
cd /work/ychen64
# Specify the database name here:
export DATABASE=nr

mkdir $DATABASE
cd $DATABASE
update_blastdb.pl --decompress --verbose $DATABASE

It will take a while to create a large local BLAST database, so update_blastdb.pl should be run with the interactive or batch job.

Once the local BLAST database is created, it needs to be updated in a timely manner (every couple of days/weeks months based on your database). Just run the script above again to update the database. The Documentation for the update_blastdb.pl script is available by running the script without any arguments.

Note:

Please don't search against the database on the NCBI BLAST server (i.e. using "-remote" option), especially for the large case. Search against local BLAST database only.
Please only use one compute node to run BLAST job. The parallel search technique provided by BLAST+ is based on the thread parallelism , so it cannot be used on the multiple nodes.
On the cluster with SLURM job scheduler (rather than PBS), use #SBATCH -c to specify the number of threads per process in the SLURM directives. Skip #SBATCH -n in the SLURM directives.

Resources

The BLAST Home Page provides links to protein and genomic data sets, as well as information on specific tools.

Last modified: September 10 2020 17:18:38.

High Performance Computing

Louisiana State University