Quick Start¶
Upstream Slurm documentation can be a helpful resource for frequently asked questions and reference documentation.
If upstream behavior does not match HPC cluster behavior, please contact us with the expected behavior and a URL from the documentation above.
Site-specific instructions for common workflows are shown below.
Submit a Batch Job¶
To create a batch Slurm script, use your favorite editor (e.g. vi or emacs) to create a text file containing both Slurm instructions and commands to run your job. All Slurm directives (special instructions) are prefaced by #SBATCH. The following is an example of a Slurm batch job script:
#!/bin/bash
#SBATCH -N 1 # request one node
#SBATCH -t 2:00:00 # request two hours
#SBATCH -p single # in single partition (queue)
#SBATCH -A your_allocation_name
#SBATCH -o slurm-%j.out-%N # optional, name of the stdout, using the job number (%j) and the hostname of the node (%N)
#SBATCH -e slurm-%j.err-%N # optional, name of the stderr, using job and hostname values
# below are job commands
date
# Set some handy environment variables.
export HOME_DIR=/home/$USER/myjob
export WORK_DIR=/work/$USER/myjob
# Make sure the WORK_DIR exists:
mkdir -p $WORK_DIR
# Copy files, jump to WORK_DIR, and execute a program called "mydemo"
cp $HOME_DIR/mydemo $WORK_DIR
cd $WORK_DIR
./mydemo
# Mark the time it finishes.
date
# exit the job
exit 0
To submit the above job to the scheduler, save the above script as a text file, e.g., singlenode.sh, then use the below command to submit:
sbatch singlenode.sh
List of useful Slurm directives and their meaning:
#SBATCH -A allocationname: short for --account, charge jobs to your allocation named allocationname.
#SBATCH -J: short for --job-name, name of the job.
#SBATCH -n : short for --ntasks, number of tasks (CPU cores) to run job on. The memory limit for jobs is 4 GB of MEM per CPU core requested.
#SBATCH -N : short for --nodes, number of nodes on which to run.
#SBATCH -c : short for --ncpus-per-task, number of threads per process.
#SBATCH -p partition: short for --partition, submit job to the partition queue.
Allowed values for partition: single, checkpt, workq, gpu, bigmem.
Depending on cluster, addition partitions can be found via the sinfo command.
#SBATCH -t hh:mm:ss: short for --time, request resources to run job for hh hours, mm minutes and ss seconds.
#SBATCH -o filename.out: short for --output, write standard output to file filename.out.
#SBATCH -e filename.err: short for --error, write standard error to file filename.err.
Note that by default, Slurm will merge stardard error and standard output to one file if no "-o" or "-e" flag is set.
#SBATCH --mail-user your@email.address: Address to send email to when the --mail-type directive below is trigerred.
#SBATCH --mail-type type: Send an email after job status typeoccurs. Common values for type include BEGIN, END, FAIL or ALL. The arguments can be combined, for e.g. BEGIN, END will send email when job begins and ends
List of common useful Slurm environmental variables and their meaning:
SLURM_JOB_ID: Job ID number given to this job.
SLURM_JOB_NODELIST: List of nodes allocated to the job.
SLURM_SUBMIT_DIR: Directory where the sbatch command was executed.
SLURM_NNODES: Total number of nodes in the job's resource allocation.
SLURM_NTASKS: Total number of CPU cores requested in a job.
Multiple Node Batch Job¶
Creating multiple-node job script is very similar to the single node job script, with the difference of using multiple nodes. Below is an example of a multiple-node batch job script:
#!/bin/bash
#SBATCH -N 2 # request two nodes
#SBATCH -n 16 # specify 16 MPI processes (8 per node)
#SBATCH -c 6 # specify 6 threads per process
#SBATCH -t 2:00:00
#SBATCH -p checkpt
#SBATCH -A your_allocation_name
#SBATCH -o slurm-%j.out-%N # optional, name of the stdout, using the job number (%j) and the first node (%N)
#SBATCH -e slurm-%j.err-%N # optional, name of the stderr, using job and first node values
# below are job commands
date
# Set some handy environment variables.
export HOME_DIR=/home/$USER/myjob
export WORK_DIR=/work/$USER/myjob
# load appropriate modules, in this case Intel compilers, MPICH
module load mpich/3.1.4/INTEL-15.0.3
# Make sure the WORK_DIR exists:
mkdir -p $WORK_DIR
# Copy files, jump to WORK_DIR, and execute a program called "my_mpi_demo"
cp $HOME_DIR/mydemo $WORK_DIR
cd $WORK_DIR
srun -N2 -n16 -c6 ./my_mpi_demo # Launch the MPI application with two nodes, 8 MPI processes each node, and 6 threads per MPI process.
# Mark the time it finishes.
date
# exit the job
exit 0
Note: in the examples above, the srun command is used to launch the MPI application. This will be the default behavior.
The syntax for the srun command is:
srun <flags> <name of the MPI executable>
Some useful flags are:
-N: number of nodes
-n: total number of MPI processes
-c: number of threads per MPI process
-u: turn on unbuffered output (the output from MPI processes will be flushed to stdout as soon as it's generate); without this flag, Slurm will buffer and rearrange the output according to the MPI ranks.
Interactive Jobs¶
https://slurm.schedmd.com/faq.html#sbatch_srun
https://slurm.schedmd.com/faq.html#prompt
To start an interactive job, use the salloc command similar to the example below:
salloc -t 1:00:00 -n8 -N1 -A your_allocation_name -p single
Similar to the batch job script, the -n denotes 8 tasks (cores), the -N denotes 1 compute node. The complete form of the above command can be:
salloc --time=1:00:00 --ntasks=8 --nodes=1 --account=your_allocation_name --partition=single
Note:
If an interactive job session is submitted to a partition other than single, the
-nor--ntasksflag will be ignored and one or more entire nodes will be allocated to the job.
Jobs Using GPUs¶
For jobs using GPUs, the number of GPU devices must be explicitly specified using the --gres=gpu: flag.
Requesting One GPU¶
If a job cannot use multiple GPU devices efficiently or if running a test job, a user should request one GPU. In this case, The job will share a node with other jobs.
For an interactive session requesting one GPU:
salloc -t hh:mm:ss -N1 -n16 --gres=gpu:1 -p gpu_partition_name -A your_allocation_name
For a batch job requesting one GPU:
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 16
#SBATCH -t hh:mm:ss
#SBATCH -p gpu_partition_name
#SBATCH --gres=gpu:1
#SBATCH -A your_allocation_name
commands to run
Please note that the valid values for the number of tasks ("-n") is between 1 and (total number of CPU cores on the node)/(total number of GPUs on the node). For instance, if a job request one GPU on a node with 64 cores and 4 GPUs, the valid value for "-n" is from 1 to 64 / 4 = 16.
Requesting More Than One GPU (But Less Than A Node)¶
Users can request more than one GPU on a node (e.g. 2 or 3 GPUs on a node with 4 GPUs). In this case, The job will also share a node with other jobs.
For an interactive session requesting multiple GPUs:
salloc -t hh:mm:ss -N1 -n32 --gres=gpu:2 -p gpu_partition_name -A your_allocation_name
For a batch job requesting multiple GPUs:
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 32
#SBATCH -t hh:mm:ss
#SBATCH -p gpu_partition_name
#SBATCH --gres=gpu:2
#SBATCH -A your_allocation_name
commands to run
Please note that the valid values for the number of tasks ("-n") is between 1 and (number of GPU requested)*(total number of CPU cores on the node)/(total number of GPUs on the node). For instance, if a job request 2 GPUs on a node with 64 cores and 4 GPUs, the valid value for "-n" is from 1 to (2 x 64) / 4 = 32.
Requesting One GPU Node With All Its GPUs¶
For an interactive session requesting one GPU node with all its GPUs (either 2 or 4, depending on the node configuration):
salloc -t hh:mm:ss -N1 -n64 --gres=gpu:number_of_gpus -p gpu_partition_name -A your_allocation_name
For a batch job requesting one GPU node and all its GPUs:
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 64
#SBATCH -t hh:mm:ss
#SBATCH -p gpu_partition_name
#SBATCH --gres=gpu:number_of_gpus
#SBATCH -A your_allocation_name
commands to run
Requesting Multiple GPU Nodes With All Their GPUs¶
For an interactive session requesting multiple GPU nodes with all their GPUs (either 2 or 4, depending on the node configuration):
salloc -t hh:mm:ss -N number_of_gpu_nodes --gres=gpu:number_of_gpus -p gpu_partition_name -A your_allocation_name
For a batch job requesting multiple GPU nodes with all their GPUs:
#!/bin/bash
#SBATCH -N number_of_gpus_nodes
#SBATCH -t hh:mm:ss
#SBATCH -p gpu_partition_name
#SBATCH --gres=gpu:number_of_gpus
#SBATCH -A your_allocation_name
commands to run
Please note that, the value of "number_of_gpus" is the number of GPUs PER NODE, not the total number of GPUs that will be allocated to the job. For instance, when requesting 2 nodes with 4 GPUs on each node, the flag should be --gres=gpu:4.