Skip to main content

mpiblast

About

mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. mpiBLAST is also portable across many different platforms and operating systems. Lastly, a renewed focus and consolidation of the many codebases has positioned mpiBLAST to continue to be of high utility to the bioinformatics community.

Versions and Availability

Softenv Keys for mpiblast on tezpur
Machine Version Softenv Key
▶ Softenv FAQ?

The information here is applicable to LSU HPC and LONI systems.

Shells

A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.

/bin/bash

System resource file: /etc/profile

When one access the shell, the following user files are read in if they exist (in order):

  1. ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
  2. ~/.bashrc (interactive login only)
  3. ~/.profile

When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.

The default value of the environmental variable, PATH, is set automatically using SoftEnv. See below for more information.

/bin/tcsh

The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.

Softenv

SoftEnv is a utility that is supposed to help users manage complex user environments with potentially conflicting application versions and libraries.

System Default Path

When a user logs in, the system /etc/profile or /etc/csh.cshrc (depending on login shell, and mirrored from csm:/cfmroot/etc/profile) calls /usr/local/packages/softenv-1.6.2/bin/use.softenv.sh to set up the default path via the SoftEnv database.

SoftEnv looks for a user's ~/.soft file and updates the variables and paths accordingly.

Viewing Available Packages

The command softenv will provide a list of available packages. The listing will look something like:

$ softenv
These are the macros available:
*   @default
These are the keywords explicitly available:
+amber-8                       Applications: 'Amber', version: 8 Amber is a
+apache-ant-1.6.5              Ant, Java based XML make system version: 1.6.
+charm-5.9                     Applications: 'Charm++', version: 5.9 Charm++
+default                       this is the default environment...nukes /etc/
+essl-4.2                      Libraries: 'ESSL', version: 4.2 ESSL is a sta
+gaussian-03                   Applications: 'Gaussian', version: 03 Gaussia
... some stuff deleted ...
Managing SoftEnv

The file ~/.soft in the user's home directory is where the different packages are managed. Add the +keyword into your .soft file. For instance, ff one wants to add the Amber Molecular Dynamics package into their environment, the end of the .soft file should look like this:

+amber-8

@default

To update the environment after modifying this file, one simply uses the resoft command:

% resoft

The command soft can be used to manipulate the environment from the command line. It takes the form:

$ soft add/delete +keyword

Using this method of adding or removing keywords requires the user to pay attention to possible order dependencies. That is, best results require the user to remove keywords in the reverse order in which they were added. It is handy to test out individual keys, but can lead to trouble if changing multiple keys. Changing the .soft file and issuing the resoft is the recommended way of dealing with multiple changes.

The soft environment will create a file $HOME/.ncbirc if not present, you can modify as required

[NCBI]
Data=/usr/local/packages/mpiblast/1.5.0-pio/intel-11.1-mvapich-1.1/ncbi/data

[BLAST]
BLASTDB=/work/$USER/MPIBLAST
BLASTMAT=/usr/local/packages/mpiblast/1.5.0-pio/intel-11.1-mvapich-1.1/ncbi/data

[mpiBLAST]
Shared=/work/$USER/MPIBLAST
Local=/work/$USER/MPIBLAST

Usage

mpiblast is normally run via a PBS batch job.

▶ Open Example?

Sample PBS Script

The example assumes a mpiblast softenv key has been set using MVAPICH.

#!/bin/sh
#
#PBS -q workq
#PBS -M whatever@whereever.foo
#PBS -A my_allocation_code
#PBS -l nodes=2:ppn=4
#PBS -l cput=06:00:00
#PBS -l walltime=06:00:00
#PBS -V
#PBS -N mpiblast
#PBS -m ae

export FORMTEXEC=mpiformatdb
export BLASTEXEC=mpiblast
export EXECDIR=/usr/local/packages/mpiblast/1.5.0-pio/intel-11.1-mvapich-1.1/bin
export WORK_DIR=$PBS_O_WORKDIR
export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`
cd $WORK_DIR
$EXECDIR/$FORMTEXEC -i drosoph.nt --nfrags=4 -pF
mpirun -machinefile $PBS_NODEFILE -np $NPROCS $EXECDIR/$BLASTEXEC -d drosoph.nt \
  -i test.in -p blastn -o results.txt

The script is submitted using qsub:

$ qsub script_name
▶ QSub FAQ?

Portable Batch System: qsub

qsub

All HPC@LSU clusters use the Portable Batch System (PBS) for production processing. Jobs are submitted to PBS using the qsub command. A PBS job file is basically a shell script which also contains directives for PBS.

Usage
$ qsub job_script

Where job_script is the name of the file containing the script.

PBS Directives

PBS directives take the form:

#PBS -X value

Where X is one of many single letter options, and value is the desired setting. All PBS directives must appear before any active shell statement.

Example Job Script
 #!/bin/bash
 #
 # Use "workq" as the job queue, and specify the allocation code.
 #
 #PBS -q workq
 #PBS -A your_allocation_code
 # 
 # Assuming you want to run 16 processes, and each node supports 4 processes, 
 # you need to ask for a total of 4 nodes. The number of processes per node 
 # will vary from machine to machine, so double-check that your have the right 
 # values before submitting the job.
 #
 #PBS -l nodes=4:ppn=4
 # 
 # Set the maximum wall-clock time. In this case, 10 minutes.
 #
 #PBS -l walltime=00:10:00
 # 
 # Specify the name of a file which will receive all standard output,
 # and merge standard error with standard output.
 #
 #PBS -o /scratch/myName/parallel/output
 #PBS -j oe
 # 
 # Give the job a name so it can be easily tracked with qstat.
 #
 #PBS -N MyParJob
 #
 # That is it for PBS instructions. The rest of the file is a shell script.
 # 
 # PLEASE ADOPT THE EXECUTION SCHEME USED HERE IN YOUR OWN PBS SCRIPTS:
 #
 #   1. Copy the necessary files from your home directory to your scratch directory.
 #   2. Execute in your scratch directory.
 #   3. Copy any necessary files back to your home directory.

 # Let's mark the time things get started.

 date

 # Set some handy environment variables.

 export HOME_DIR=/home/$USER/parallel
 export WORK_DIR=/scratch/myName/parallel
 
 # Set a variable that will be used to tell MPI how many processes will be run.
 # This makes sure MPI gets the same information provided to PBS above.

 export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`

 # Copy the files, jump to WORK_DIR, and execute! The program is named "hydro".

 cp $HOME_DIR/hydro $WORK_DIR
 cd $WORK_DIR
 mpirun -machinefile $PBS_NODEFILE -np $NPROCS $WORK_DIR/hydro

 # Mark the time processing ends.

 date
 
 # And we're out'a here!

 exit 0

You can download the database file used in the above script

wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/drosoph.nt.gz
gunzip drosoph.nt.gz

The input file used was

>Test
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA
GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG
AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT

Resources

Last modified: November 11 2014 16:58:42.