How to background and distribute unrelated processes #todo/update-for-slurm¶
Introduction¶
All compute nodes have more than one core/processor, so even if the job is not explicitly parallel (using OpenMP or MPI), it is still beneficial to be able to launch multiple jobs in a single submit script. This document will briefly explain how to launch multiple processes on a single computer node and among 2 or more compute nodes.
Note this method does NOT facilitate communication among processes, and is therefore not appropriate for use with parallelized executables using MPI; it is still okay to invoke multithreaded executables because they are invoked initially as a single process. The caveat there is one should not use more threads than there are cores on a single computer node.
Basic Use Case¶
A user has a serial executable they want to run multiple times on HPC resources, and wants to run many per submission script.
Instead of submitting one queue script per serial execution, the user wants to launch a serial process per available core.
Required Tools¶
- the shell (bash)
- ssh (for distributing processes to remote compute nodes)
Launching Multiple Processes - Basic Example¶
On a single compute node¶
This example assumes one knows of the number of available processors on a a single node. The following example is a bash shell script that launches each process and backgrounds the command using the & symbol.
#!/bin/bash
# -- the following 8 lines issue a command, using a subshell,
# into the background this creates 8 child processes, belonging
# to the current shell script when executed
/path/to/exe1 & # executable/script arguments may be passed as expected
/path/to/exe2 &
/path/to/exe3 &
/path/to/exe4 &
/path/to/exe5 &
/path/to/exe6 &
/path/to/exe7 &
/path/to/exe8 &
# -- now WAIT for all 8 child processes to finish
# this will make sure that the parent process does not
# terminate, which is especially important in batch mode
wait
On multiple compute nodes¶
Distributing processes onto remote compute nodes builds upon the single node example example.
When using multiple compute nodes, one can use the 8 commands from above for the mother superior node (i.e., the "home" node, which will be the compute node that the batch scheduler uses to execute the shell commands contained inside of the queue script). For the remote nodes, one must use the ssh to launch the command on the remote host.
#!/bin/bash
# Define where the input files are
export WORKDIR=/path/to/where/i/want/to/run/my/job
# -- the following 8 lines issue a command, using a subshell,
# into the background this creates 8 child processes, belonging
# to the current shell script when executed
# -- for mother superior, or "home", compute node
/path/to/exe1 &
/path/to/exe2 &
/path/to/exe3 &
/path/to/exe4 &
/path/to/exe5 &
/path/to/exe6 &
/path/to/exe7 &
/path/to/exe8 &
# -- for an additional, remote compute node
# Assuming executable is to be run on WORKDIR
ssh -n remotehost 'cd '$WORKDIR'; /path/to/exe1 ' &
ssh -n remotehost 'cd '$WORKDIR'; /path/to/exe2 ' &
ssh -n remotehost 'cd '$WORKDIR'; /path/to/exe3 ' &
ssh -n remotehost 'cd '$WORKDIR'; /path/to/exe4 ' &
ssh -n remotehost 'cd '$WORKDIR'; /path/to/exe5 ' &
ssh -n remotehost 'cd '$WORKDIR'; /path/to/exe6 ' &
ssh -n remotehost 'cd '$WORKDIR'; /path/to/exe7 ' &
ssh -n remotehost 'cd '$WORKDIR'; /path/to/exe8 ' &
# -- now WAIT for all 8 child processes to finish
# this will make sure that the parent process does not
# terminate, which is especially important in batch mode
wait
The example above will spawn 16 commands, 8 of the which run on the local compute node (i.e., mother superior) and 8 run on remotehost. The background token (&) backgrounds the command on the local node (for all 16 commands); the commands sent to remotehost are not back-grounded remotely because this would not allow the local machine to know when the local command (i.e., ssh) completed.
Note as it is often the case that one does not know the identity of the mother superior node or the set of remote compute nodes (i.e., remotehost) when submitting such a script to the batch scheduler, some more programming must be done to determine the identity of these nodes at runtime. The following section considers this and concludes with a basic, adaptive example.
Advanced Example for PBS Queuing Systems¶
The steps involved in the following example are:
- determine identity of mother superior
- determine list of all remote compute nodes
Assumptions:
- shared file system
- a list of all compute nodes assigned by PBS are contained in a file referenced with the environmental variable, ${PBS_NODEFILE}
- each compute node has 8 processing cores
Note this example still requires that one knows the number of cores available per compute node; in this example, 8 is assumed.
!#/bin/bash
# Define where the input files are
export WORKDIR=/path/to/where/i/want/to/run/my/job
# -- Get List of Unique Nodes and put into an array ---
NODES=($(uniq $PBS_NODEFILE ))
# -- Assuming all input files are in the WORKDIR directory (change accordingly)
cd $WORKDIR
# -- The first array element contains the mother superior
# while the second onwards contains the worker or remote nodes --
# -- for mother superior, or "home", compute node
/path/to/exe1 &
/path/to/exe2 &
/path/to/exe3 &
/path/to/exe4 &
/path/to/exe5 &
/path/to/exe6 &
/path/to/exe7 &
/path/to/exe8 &
# -- Launch 8 processes on the first remote node --
ssh ${NODES[1]} ' \
cd '$WORKDIR' \
/path/to/exe1 & \
/path/to/exe2 & \
/path/to/exe3 & \
/path/to/exe4 & \
/path/to/exe5 & \
/path/to/exe6 & \
/path/to/exe7 & \
/path/to/exe8 & ' &
# Repeat above script for other remote nodes
# Remember that bash arrays start from 0 not 1 similar to C Language
# You can also use a loop over the remote nodes
#NUMNODES=$(uniq $PBS_NODEFILE | wc -l | awk '{print $1-1}')
# for i in $(seq 1 $NUMNODES ); do
# ssh -n ${NODES[$i]} ' launch 8 processors ' &
# done
# -- now WAIT for all child processes to finish
# this will make sure that the parent process does not
# terminate, which is especially important in batch mode
wait
Submitting to PBS¶
Now let's assume you have 128 tasks to run. You can do this by running on 16 8-core nodes using PBS. If one task takes 7 hours, and allowing a 30 minute safety margin, the following qsub command line will take care of running all 128 tasks:
% qsub -I -A allocation_account -V -l walltime=07:30:00,nodes=16:ppn=8
When the script executes, it will find 16 node names in its PBS_NODEFILE, run 1 set of 8 tasks on the mother superior, and 15 sets of 8 on the remote nodes.
Advanced Usage Possibilities¶
-
executables in the above script can take normal arguments and flags; e.g.:
/path/to/exe1 arg1 arg2 -flag arg3 &
-
one can, technically, initiate multithreaded executables; the catch is to make sure there is only one thread per processing core is allowed; the following example launches 4, 2-threaded executables (presumably using OpenMP) locally - thus consuming all 8 (i.e., 4 processes * 2 threads/process), each.
# "_r" simply denotes that executable is multithreaded
OMP_NUM_THREADS=2 /path/to/exe1_r &
OMP_NUM_THREADS=2 /path/to/exe2_r &
OMP_NUM_THREADS=2 /path/to/exe3_r &
OMP_NUM_THREADS=2 /path/to/exe4_r &
# -- in total, 8 threads are being executed on the 8 (assumed) cores
# contained by this compute node; i.e., there are 4 processes, each
# with 2 threads running.
A creative user can get even more complicated, but in general there is no need.
Conclusion¶
In situations where one wants to take advantage of all processors available on a single compute node or a set of compute nodes to run a number of unrelated processes, using background processes locally (and via ssh) remotely allows one to do this.
Converting the above scripts to csh and tcsh is straightforward
Using the examples above, one may create a customized solution using the proper techniques. For questions and comments regarding this topic and the examples above, please email us.