Skip to content

MPI Parallelization

Compile

MPI (message passing interface) is a standard in parallel computing for data communication across distributed processes.

Building MPI applications on LONI 64 bit Intel cluster

The proper way to compile with the MPI library is to use the compiler scripts installed with the library. Once your flavor of MPI (i.e. OpenMPI, MPICH, MVAPICH2, etc) and compiler suite (i.e. Intel, GNU) have been set up using Modules, you're good to go.

The compiler command you use will depend on the language you program in. For instance, if you program in C, regardless of whether its the Intel C compiler or the GNU C compiler, the command would be mpicc. The command is then used exactly as one would use the native compiler. For instance:

$ mpicc test.c -O3 -o a.out
$ mpif90 test.F -O3 -o a.out

There are slight differences in how each version of MPI launches a program for parallel execution. For that refer to the specific version information. But, by way of example, here is what a PBS job script might look like:

#!/bin/bash
#PBS -q workq
#PBS -A your_allocation
#PBS -l nodes=2:ppn=16
#PBS -l walltime=20:00:00
#PBS -o /scratch/$USER/s_type/output/myoutput2
#PBS -j oe
#PBS -N s_type
export HOME_DIR=/home/$USER/
export WORK_DIR=/work/$USER/test
export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`
cd $WORK_DIR
cp $HOME_DIR/a.out .
mpirun -machinefile $PBS_NODEFILE -np $NPROCS $WORK_DIR/a.out

MPI example launched in SLURM job can be found here.

How to run hybrid MPI and OpenMP jobs #todo/update-for-slurm

Combination of MPI and OpenMP in programming can provide high parallel efficiency. For most hybrid codes, OpenMP threads are spread within one MPI task. This kind of hybrid codes are widely used in many fields of science and technology. A sample bash shell script for running hybrid jobs is provided in the following.

#!/bin/bash
#PBS -A my_allocation_code
#PBS -q workq
#PBS -l walltime=00:10:00
#PBS -l nodes=2:ppn=16        # ppn=16 for SuperMike and ppn=20 for SuperMIC
#PBS -V                       # make sure environments are the same for asigned nodes
#PBS -N my_job_name           # will be shown in the queue system
#PBS -o my_name.out           # normal output
#PBS -e my_name.err           # error output

export TASK_PER_NODE=2        # number of MPI tasks per node
export THREADS_PER_TASK=8     # number of OpenMP threads per MPI task

cd $PBS_O_WORKDIR             # go to the path where you run qsub

# Get the node names from the PBS node file and gather them to a new file
cat $PBS_NODEFILE|uniq>nodefile   

# Run the hybrid job.
# Use "-x OMP_NUM_THREADS=..." to make sure that 
# the number of OpenMP threads is passed to each MPI task
mpirun -npernode $TASK_PER_NODE -x OMP_NUM_THREADS=$THREADS_PER_TASK -machinefile nodefile ./my_exe

Non-uniform memory access (NUMA) is a common issue when running hybrid jobs. The reason causing this issue is that there are two sockets on one CPU card. In theory the parallel efficiency is the highest if the number of OpenMP threads equals the number of cores of each socket in one CPU. But in practice it varies from case to case depending on users' codes.

Known issues

system(), fork() and popen()

Calls to system library functions system(), fork() and popen() are not supported by the Infiniband driver under the current Linux kernel. Any code that makes these calls inside the MPI scope (between MPI initialization and finalization) will likely fail.