Versions and Availability
Module Names for blast on qb2
▶ Module FAQ?
The information here is applicable to LSU HPC and LONI systems.
A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.
System resource file: /etc/profile
When one access the shell, the following user files are read in if they exist (in order):
- ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
- ~/.bashrc (interactive login only)
When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.
The default value of the environmental variable, PATH, is set automatically using Modules. See below for more information.
The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.
Modules is a utility which helps users manage the complex business of setting up their shell environment in the face of potentially conflicting application versions and libraries.
When a user logs in, the system looks for a file named .modules in their home directory. This file contains module commands to set up the initial shell environment.
Viewing Available Modules
$ module avail
displays a list of all the modules available. The list will look something like:
--- some stuff deleted --- velvet/1.2.10/INTEL-14.0.2 vmatch/2.2.2 ---------------- /usr/local/packages/Modules/modulefiles/admin ----------------- EasyBuild/1.11.1 GCC/4.9.0 INTEL-140-MPICH/3.1.1 EasyBuild/1.13.0 INTEL/14.0.2 INTEL-140-MVAPICH2/2.0 --- some stuff deleted ---
The module names take the form appname/version/compiler, providing the application name, the version, and information about how it was compiled (if needed).
Besides avail, there are other basic module commands to use for manipulating the environment. These include:
add/load mod1 mod2 ... modn . . . Add modules rm/unload mod1 mod2 ... modn . . Remove modules switch/swap mod . . . . . . . . . Switch or swap one module for another display/show . . . . . . . . . . List modules loaded in the environment avail . . . . . . . . . . . . . . List available module names whatis mod1 mod2 ... modn . . . . Describe listed modules
The -h option to module will list all available commands.
About the Software
Basic Local Alignment Search Tool, or BLAST, is an algorithm
for comparing primary biological sequence information, such as the amino-acid
sequences of different proteins or the nucleotides of DNA sequences. - Homepage: http://blast.ncbi.nlm.nih.gov/
A suite of tools are provided in the BLAST+, such as blastx, blastn, blastp. The command line below is provided by NCBI, which is a blastn search against a database. This command:
blastn -db nt -qurey nt.fsa -out results.out
will run a blastn search of nt.fsa (a nucleotide sequence in FASTA format) against the nt database, printing results to the file results.out.
BLAST+ uses an environment variable $BLASTDB to point to the directory where the database is located in. So this environment variable should be specified before running blast commands, see example below:
Example: run blastx search in PBS batch job
This example shows a blastx search of trinity.fasta against the nr database, printing results to the file blastx.out. The nr database files are in the directory /work/ychen64/nr.
#!/bin/bash #PBS -q workq #PBS -l nodes=1:ppn=20 #PBS -l walltime=72:00:00 #PBS -A your_allocation_name export BLASTDB=/work/ychen64/nr blastx -query trinity.fasta -db nr -out blastx.out -num_threads 20
The option -num_threads is used to specify the number of threads (CPUs) in the BLAST search. This number should match the number in "ppn=" in the "#PBS -l nodes=1:ppn=" to fully utilize the CPU power.
Example: create local BLAST database in PBS batch job
The local BLAST database can be created by a perl script called "update_blastdb.pl" included as part of BLAST+. perl is required to run this script. In this example, a nr database is created at /work/ychen64
#!/bin/bash #PBS -q workq #PBS -l nodes=1:ppn=20 #PBS -l walltime=72:00:00 #PBS -A your_allocation_name cd /work/ychen64 # Specify the database name here: export DATABASE=nr mkdir $DATABASE cd $DATABASE update_blastdb.pl --decompress --verbose $DATABASE
It will take a while to create a large local BLAST database, so update_blastdb.pl should be run with the interactive or batch job.
Once the local BLAST database is created, it needs to be updated in a timely manner (every couple of days/weeks months based on your database). Just run the script above again to update the database. The Documentation for the update_blastdb.pl script is available by running the script without any arguments.
- Please don't search against the database on the NCBI BLAST server (i.e. using "-remote" option), especially for the large case. Search against local BLAST database only.
- Please only use one compute node to run BLAST job. The parallel search technique provided by BLAST+ is based on the thread parallelism , so it cannot be used on the multiple nodes.
- On the cluster with SLURM job scheduler (rather than PBS), use #SBATCH -c to specify the number of threads per process in the SLURM directives. Skip #SBATCH -n in the SLURM directives.
- The BLAST Home Page provides links to protein and genomic data sets, as well as information on specific tools.
Last modified: September 10 2020 17:18:38.