How to Background and Distribute Unrelated Processes
All compute nodes have more than one core/processor, so even if the job is not explicitly parallel (using OpenMP or MPI), it is still beneficial to be able to launch multiple jobs in a single submit script. This document will briefly explain how to launch multiple processes on a single compute node and across multiple compute nodes using SLURM-supported methods.
Running Many Independent Tasks with SLURM
Many scientific workflows require running the same program many times with different inputs, parameters, or random seeds. These tasks are often independent, meaning they do not need to communicate with each other while running.
For this type of workload, users should generally avoid manually backgrounding processes with & or using ssh to launch tasks on compute nodes. Instead, users should use one of the following SLURM-supported approaches:
- SLURM job arrays
- GNU Parallel inside a SLURM job
srunfor launching serial, multithreaded, or MPI tasks
These methods allow SLURM to manage resources, task placement, accounting, and scheduling more reliably.
When to Use Each Method
Use SLURM Job Arrays When
you have many similar independent jobs, such as:
- Running the same program on many input files
- Running simulations with different parameters
- Processing many datasets independently
- Running one task per array index
Each array task runs as a separate SLURM job task and receives a unique value through:
$SLURM_ARRAY_TASK_ID
Use GNU Parallel When
you have a list of many independent commands and want to run several of them concurrently inside a single SLURM job allocation.
GNU Parallel is useful when:
- You have many short serial jobs
- You want to reduce scheduler overhead
- You want to run multiple commands from a command list
- You want better control over how many tasks run at the same time
SLURM Job Array Examples
Example 1: Serial Job Array
This example runs one serial task per array index.
Assume the input files are named:
input_1.dat
input_2.dat
input_3.dat
...
input_100.dat
Create a SLURM script named:
serial_array.slurm
#!/bin/bash
#SBATCH -J serial_array
#SBATCH -A allocation_account
#SBATCH -p partition_name
#SBATCH -a 1-100
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -t 01:00:00
#SBATCH -o logs/serial_%A_%a.out
#SBATCH -e logs/serial_%A_%a.err
# Create output/log directory if it does not exist
mkdir -p logs
# Each array task gets a unique task ID
INPUT_FILE="input_${SLURM_ARRAY_TASK_ID}.dat"
OUTPUT_FILE="output_${SLURM_ARRAY_TASK_ID}.dat"
echo "Running task ID: ${SLURM_ARRAY_TASK_ID}"
echo "Input file: ${INPUT_FILE}"
echo "Output file: ${OUTPUT_FILE}"
./my_serial_program "$INPUT_FILE" "$OUTPUT_FILE"
Submit the job array:
sbatch serial_array.slurm
This submits 100 independent tasks. Each task uses a different value of SLURM_ARRAY_TASK_ID.
Example 2: Limit the Number of Simultaneous Array Tasks
If you have many jobs but do not want all of them to run at once, use the % limit.
For example:
#SBATCH -a 1-100%10
This submits 100 array tasks but allows only 10 to run at the same time.
Example script:
#!/bin/bash
#SBATCH -J limited_array
#SBATCH -A allocation_account
#SBATCH -p partition_name
#SBATCH -a 1-100%10
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -t 01:00:00
#SBATCH -o logs/limited_%A_%a.out
#SBATCH -e logs/limited_%A_%a.err
mkdir -p logs
INPUT_FILE="input_${SLURM_ARRAY_TASK_ID}.dat"
OUTPUT_FILE="output_${SLURM_ARRAY_TASK_ID}.dat"
./my_serial_program "$INPUT_FILE" "$OUTPUT_FILE"
This approach is useful when running thousands of jobs without overwhelming the scheduler or the file system.
GNU Parallel Examples
Example 3: GNU Parallel with Serial Tasks
GNU Parallel can run many independent commands from a command file.
First, create a file named:
commands.txt
Example contents:
./my_serial_program input_1.dat output_1.dat
./my_serial_program input_2.dat output_2.dat
./my_serial_program input_3.dat output_3.dat
./my_serial_program input_4.dat output_4.dat
./my_serial_program input_5.dat output_5.dat
./my_serial_program input_6.dat output_6.dat
./my_serial_program input_7.dat output_7.dat
./my_serial_program input_8.dat output_8.dat
Then create a SLURM script:
#!/bin/bash
#SBATCH -J gnu_parallel_serial
#SBATCH -A allocation_account
#SBATCH -p partition_name
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH -t 02:00:00
#SBATCH -o logs/parallel_%j.out
#SBATCH -e logs/parallel_%j.err
mkdir -p logs
module load parallel
# Run 8 commands at the same time
parallel -j $SLURM_CPUS_PER_TASK < commands.txt
Submit the job:
sbatch gnu_parallel_serial.slurm
In this example:
The SLURM job requests 8 CPU cores.
GNU Parallel runs up to 8 serial commands at the same time.
Each command uses 1 CPU core.
Example 4: GNU Parallel with Input Files
Instead of writing every command manually, GNU Parallel can read a list of input files.
Create a file named:
input_files.txt
Example contents:
input_1.dat
input_2.dat
input_3.dat
input_4.dat
input_5.dat
input_6.dat
input_7.dat
input_8.dat
SLURM script:
#!/bin/bash
#SBATCH -J gnu_parallel_inputs
#SBATCH -A allocation_account
#SBATCH -p partition_name
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH -t 02:00:00
#SBATCH -o logs/parallel_inputs_%j.out
#SBATCH -e logs/parallel_inputs_%j.err
mkdir -p logs
module load parallel
parallel -j $SLURM_CPUS_PER_TASK './my_serial_program {} {.}.out' :::: input_files.txt
In this example:
{}is replaced by the input filename{.}is replaced by the filename without its extension
For example:
input_1.dat -> input_1.out
input_2.dat -> input_2.out
Choosing the Right Approach
| Workflow Type | Recommended Method |
|---|---|
| Many independent serial jobs | SLURM job array or GNU Parallel |
| Many short serial jobs | GNU Parallel inside one SLURM job |
| Many independent jobs with different input files | SLURM job array |
| Very large number of jobs | Job array with a concurrency limit |
| Independent multithreaded jobs | Job array with -c or GNU Parallel with controlled -j |
| One parallel MPI application | Single SLURM job using srun |
| Many independent MPI simulations | MPI job array or GNU Parallel with srun --exclusive |
| Hybrid MPI/OpenMP application | SLURM job using --ntasks-per-node and --cpus-per-task |
Best Practices
Avoid Manual ssh
Users should generally avoid using ssh inside SLURM jobs to launch tasks on compute nodes. It bypasses some of SLURM’s resource management and accounting behavior.
Use srun, SLURM job arrays, or GNU Parallel instead.
Avoid Oversubscription
Do not run more processes or threads than the resources requested.
For example, if you request:
#SBATCH -c 4
then set:
export OMP_NUM_THREADS=4
For GNU Parallel, make sure:
number of simultaneous jobs × threads per job <= CPUs allocated
Example:
4 simultaneous jobs × 4 threads each = 16 CPU cores needed
Use Job Array Concurrency Limits
For large workflows, avoid launching thousands of jobs at the same time.
Use:
#SBATCH -a 1-1000%50
This submits 1000 array tasks but runs only 50 at a time.
Use Separate Output Files
For job arrays, use %A and %a in output filenames:
#SBATCH -o logs/job_%A_%a.out
#SBATCH -e logs/job_%A_%a.err
Where:
%Ais the main SLURM job ID%ais the array task ID
Use srun for MPI
MPI jobs should generally be launched with srun unless the cluster documentation recommends another MPI launcher.
Example:
srun ./my_mpi_program
Conclusion
For running many independent tasks on SLURM clusters, users should use SLURM job arrays or GNU Parallel instead of manually backgrounding commands or launching tasks with ssh.
SLURM job arrays are well suited for large numbers of similar jobs, especially when each task can be identified by an array index. GNU Parallel is useful for running many independent commands within a single SLURM allocation. MPI jobs should be launched with srun, and hybrid MPI/OpenMP jobs should carefully match MPI ranks and CPU threads to the requested resources.
These approaches improve resource usage, reduce scheduler overhead, and allow SLURM to properly manage task placement and accounting.
Using the examples above, one may create a customized solution using the proper techniques. For questions and comments regarding this topic and the examples above, please email us.