Skip to content

Submitting Multiple Dependent Jobs

Submitting Multiple Dependent Jobs

Job dependencies are used to defer the start of a job until the specified dependent jobs have completed. They are specified with the --dependency option to the sbatch command using the below format:

sbatch --dependency=<type:job_id[:job_id][,type:job_id[:job_id]]> ...

Before trying to use dependent jobs, please first note that overhead for starting and stopping a job in Slurm is very high (e.g.,the scheduler needs to allocation node resources, check the nodes, start your job, after the job commands are done, the nodes need to be retrieved by the scheduler for the next job. Therefore if your jobs use the same configuration (i.e., same number of nodes and cores), instead of using dependent jobs, use a single job and run the dependent commands/tasks sequentially. It is much better to have less but longer running jobs.

Below is an example for submitting three jobs job1.sh, job2.sh and job3.sh. job3.sh will depend on the completion of job1.sh and job2.sh, in this very simple example, job1.sh and job2.sh first sleep for a few seconds and then output their job-id $SLURM_JOBID to a file named "depfile", job3.sh will display the content of "depfile" and ensure job1.sh and job2.sh are both completed:

job1.sh:

#!/bin/bash
#SBATCH --time 1:00:00
#SBATCH --nodes 1

sleep 10 # sleep 10 seconds
echo $SLURM_JOBID >> depfile # output job-id to depfile

exit

job2.sh:

#!/bin/bash
#SBATCH --time 1:00:00
#SBATCH --nodes 1

sleep 5 # sleep 5 seconds, on an idle cluster with at least 2 nodes, job2.sh will finish before job1.sh
echo $SLURM_JOBID >> depfile # output job-id to depfile

exit

job3.sh:

#!/bin/bash
#SBATCH --time 1:00:00
#SBATCH --nodes 1

# show content of depfile, it should have the job-id of both job1.sh and job2.sh
cat depfile 

exit

We use the below script to submit the three jobs from the login node, by using the --dependency option in Slurm, job3.sh will start after job1.sh and job2.sh are both completed.

submit.sh:

#!/bin/bash
# Do NOT submit this script using sbatch!
# use the below comand to get the job-id of the first job
# the sbatch will output a line containing the job-id just submitted
# we use the cut command to get the job-id (last field)

JOBID1=$( sbatch job1.sh | cut -d' ' -f4 )
echo "Submitted batch job $JOBID1"

JOBID2=$( sbatch job2.sh | cut -d' ' -f4 )
echo "Submitted batch job $JOBID2"

# job3.sh depends on the completion of job1.sh and job2.sh
sbatch --dependency=afterok:$JOBID1:$JOBID2 job3.sh

We then run the submit.sh bash script to submit the three dependent jobs, note that this script is NOT a job script so do NOT submit it using sbatch.

[fchen14@philip1 slurmdoc]$ ./submit.sh
Submitted batch job 27
Submitted batch job 28
Submitted batch job 29
# check the job status using squeue, note the (Dependency) flag for job-id 29.
[fchen14@philip1 slurmdoc]$ squeue -u fchen14
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                27   checkpt  job1.sh  fchen14 CF       0:04      1 philip011
                28   checkpt  job2.sh  fchen14 CF       0:04      1 philip012
                29   checkpt  job3.sh  fchen14 PD       0:00      1 (Dependency)
# job2.sh (job-id=28) finishes first
[fchen14@philip1 slurmdoc]$ squeue -u fchen14
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                29   checkpt  job3.sh  fchen14 PD       0:00      1 (Dependency)
                27   checkpt  job1.sh  fchen14  R       0:14      1 philip011
# job3.sh starts after job1.sh (job-id=27) is finished
[fchen14@philip1 slurmdoc]$ squeue -u fchen14
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                29   checkpt  job3.sh  fchen14 CF       0:04      1 philip011
# all three jobs are finished
[fchen14@philip1 slurmdoc]$ squeue -u fchen14
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
# the output file of job 29 (job-id 29) shows job-ids of both job1.sh and job2.sh
[fchen14@philip1 slurmdoc]$ cat slurm-29.out
JOB2ID=28
JOB1ID=27