Skip to main content

gnuparallel

About

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

Versions and Availability

Module Names for gnuparallel on qb
Machine Version Module Name
qb2 20170122 gnuparallel/20170122
▶ Module FAQ?

The information here is applicable to LSU HPC and LONI systems.

Shells

A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.

/bin/bash

System resource file: /etc/profile

When one access the shell, the following user files are read in if they exist (in order):

  1. ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
  2. ~/.bashrc (interactive login only)
  3. ~/.profile

When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.

The default value of the environmental variable, PATH, is set automatically using SoftEnv. See below for more information.

/bin/tcsh

The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.

Modules

Modules is a utility which helps users manage the complex business of setting up their shell environment in the face of potentially conflicting application versions and libraries.

Default Setup

When a user logs in, the system looks for a file named .modules in their home directory. This file contains module commands to set up the initial shell environment.

Viewing Available Modules

The command

$ module avail

displays a list of all the modules available. The list will look something like:

--- some stuff deleted ---
velvet/1.2.10/INTEL-14.0.2
vmatch/2.2.2

---------------- /usr/local/packages/Modules/modulefiles/admin -----------------
EasyBuild/1.11.1       GCC/4.9.0              INTEL-140-MPICH/3.1.1
EasyBuild/1.13.0       INTEL/14.0.2           INTEL-140-MVAPICH2/2.0
--- some stuff deleted ---

The module names take the form appname/version/compiler, providing the application name, the version, and information about how it was compiled (if needed).

Managing Modules

Besides avail, there are other basic module commands to use for manipulating the environment. These include:

add/load mod1 mod2 ... modn . . . Add modules
rm/unload mod1 mod2 ... modn  . . Remove modules
switch/swap mod . . . . . . . . . Switch or swap one module for another
display/show  . . . . . . . . . . List modules loaded in the environment
avail . . . . . . . . . . . . . . List available module names
whatis mod1 mod2 ... modn . . . . Describe listed modules

The -h option to module will list all available commands.

Module is currently available only on SuperMIC.

Usage

Parallel typical serial and MPI-based applications.

(1) Parallel serial jobs

Example of a blast job on Mike:

#!/bin/bash

#PBS -A hpc_smictest3
#PBS -l nodes=2:ppn=16
#PBS -l walltime=1:00:00
#PBS -q workq

cd $PBS_O_WORKDIR
export JOBS_PER_NODE=16
export WDIR=$PBS_O_WORKDIR

parallel --progress \				# shows progres
         --joblog logfile \			# job logfile
         -j $JOBS_PER_NODE \		# jobs per node
         --slf $PBS_NODEFILE \			# nodes assigned to your job
         --workdir $WDIR \			
         ./cmd_blast.sh {} {/.} :::: input.lst  #script_to_parallize input output joblist

where: input.lst contains job input list:

/work/$USER/blast/data/input1.faa
/work/$USER/blast/data/input2.faa
....
/work/$USER/blast/data/input200.faa

where: cmd_blast.sh is the script for running a serial blast job

e.g.: ./cmd_blast.sh input1.faa input1 -- how to run single serial job

#!/bin/bash

export WDIR=/xxx/xxx
cd $WDIR
blastp -query $1 -db db/img_v400_PROT.00 -out output/$2.out -outfmt 7 -max_target_seqs 100 -num_threads 2
	

(2) Parallel MPI jobs

Use "mpirun" to run a laplace

#!/bin/bash

#PBS -A your_allocation_name
#PBS -l walltime=2:00:00
#PBS -l nodes=4:ppn=16
#PBS -q checkpt

export JOBS_PER_NODE=8
export NPROCS=2
export WDIR=$PBS_O_WORKDIR
cd $WDIR
parallel --progress \
         -j $JOBS_PER_NODE \
         --slf $PBS_NODEFILE \
         --workdir $WDIR \
         ./cmd_mpi.sh {} $NPROCS :::: input.lst
	

where: cmd_mpi.sh is the script to run one MPI job

#!/bin/bash

export WDIR=$PBS_O_WORKDIR
FILE=$(eval echo $1)
param=`cat ${FILE}`
mpirun -ppn $2 $WDIR/lap_mpi $param

where: input.lst contains job input list:

/work/$USER/laplace/data/input1
/work/$USER/laplace/data/input2
....
/work/$USER/laplace/data/input200

cat input1:
4096 4096 2 2 0.08 20000 0 0

Resources

Last modified: August 22 2017 15:10:53.