HPC@LSU | Documentation | User Guides

SuperMIC

▶ Table of Contents

Access to SuperMIC
Transferring Files to and from SuperMIC
Computing Environment
File Systems
Application Development
Running Applications

System Overview

SuperMIC (pronounced as Super Mick) is an LSU supercomputer funded by an National Science Foundation's (NSF) Major Research Instrumentation (MRI) award to the Center for Computation & Technology.

SuperMIC is capable of a peak theoretical performance of over 925 TF. It achieved a performance of 557 TF during testing, which placed it as number 65 in the June 2014 Top500 List.

SuperMIC went operational on October 1, 2014. It contains a total of 382 nodes, each with two 10-core 2.8GHz Intel Ivy Bridge-EP processors. 380 compute nodes each have 64 GB of memory and 500 GB of local HDD storage. 360 of the compute nodes have 2 Intel Xeon Phi 7120P coprocessors. 20 of the compute nodes have 1 Intel Xeon Phi 7120P coprocessor and 1 NVIDIA Tesla K20X. LSU users will need to use their LSU HPC credentials to gain access to SuperMIC (see: LSU HPC account request), and require access to an LSU HPC allocation (see: LSU HPC allocation request) to run production jobs on the system.

Four compute nodes, each equipped with two Intel Skylake processors, two NVIDIA V100 GPU devices and 2 TB of NVMe SSD, were recently added to SuperMIC. Detailed description of these nodes can be found below.

Configuration

1 Login Node
- Two 2.8GHz 10-Core Ivy Bridge-EP E5-2680 Xeon 64-bit Processors
- One Intel Xeon Phi 7120P Coprocessors
- 128GB DDR3 1866MHz Ram
- 1TB HD
- 56 Gigabit/sec Infiniband network interface
- 10 Gigabit Ethernet network interface
- Red Hat Enterprise Linux 6
1 Login Node
- Two 2.8GHz 10-Core Ivy Bridge-EP E5-2680 Xeon 64-bit Processors
- One NVIDIA Tesla K20X 6GB GPU
- 128GB DDR3 1866MHz Ram
- 1TB HD
- 56 Gigabit/sec Infiniband network interface
- 10 Gigabit Ethernet network interface
- Red Hat Enterprise Linux 6
360 Compute Nodes
- Two 2.8GHz 10-Core Ivy Bridge-EP E5-2680 Xeon 64-bit Processors
- Two Intel Xeon Phi 7120P Coprocessors
- 64GB DDR3 1866MHz Ram
- 500GB HD
- 56 Gigabit/sec Infiniband network interface
- 1 Gigabit Ethernet network interface
- Red Hat Enterprise Linux 6
20 Hybrid Compute Nodes
- Two 2.8GHz 10-Core Ivy Bridge-EP E5-2680 Xeon 64-bit Processors
- One Intel Xeon Phi 7120P Coprocessors
- One NVIDIA Tesla K20X 6GB GPU with GPUDirect Support
- 64GB DDR3 1866MHz Ram
- 500GB HD
- 56 Gigabit/sec Infiniband network interface
- 1 Gigabit Ethernet network interface
- Red Hat Enterprise Linux 6
3 Big Memory Compute Nodes
- Two 2.6GHz 14-Core Broadwell E5-2690 v4 Xeon 64-bit Processors
- 256GB DDR4 2400MHz Ram
- 56 Gigabit/sec Infiniband network interface
- Red Hat Enterprise Linux 6
4 V100 Compute Nodes
- Two 2.7GHz 18-Core Skylake Gold 6150 Xeon 64-bit Processors
- Two NVIDIA V100 16GB GPU
- 384GB DDR4 2666MHz Ram
- 2TB NVMe Solid State Drive
- 56 Gigabit/sec Infiniband network interface
- Red Hat Enterprise Linux 6
Cluster Storage
- 840TB Lustre High-Performance disk
- 5TB NFS-mounted /home disk storage

1. System Access to SuperMIC

1.1. SSH (for LSU users only)

Note: for XSESE users, please use gsissh or the Single Sign On login hub.

To access SuperMIC, users must connect using an Secure Shell (SSH) client.

Linux and Mac Users - SSH client is already installed and can be accessed from the command prompt using the ssh command. One would issue a command similar to the following:

$ ssh -X username@smic.hpc.lsu.edu

The user would then be prompted for his password. The -X flags allow for X11 Forwarding to be set up automatically.

Windows Users - You will need to download and install a SSH client such as the PuTTY utility. If users need access to login with X11 Forwarding, a X-Server needs to be installed and running on your local Windows machine. Xming X Server is recommended, advanced users may also install Cygwin which also provides a command line ssh client similar to that available for Linux and Mac Users.

If you have forgotten your password, or you wish to reset it, see here(click "Forgot your password?").

1.2. Help

To report a problem please run the ssh or gsissh command with the "-vvv" option and include the verbose information in the ticket.

2. File Transfer

2.1. SCP

Using scp is the easiest method to use when transferring single files.

Local File to Remote Host

% scp localfile user@remotehost:/destination/dir/or/filename

Remote Host to Local File

% scp user@remotehost:/remote/filename localfile

2.2. SFTP

Interactive Mode

One may find this mode very similar to the interactive interface offered. A login session may look similar to the following:

% sftp user@remotehost
(enter in password)
 ...
sftp>

The commands are similar to those offered by the outmoded ftp client programs: get, put, cd, pwd, lcd, etc. For more information on the available set of commands, one should consult sftp the man page.

% man sftp

Batch Mode

One may use sftp interactively in two cases.

Case 1: Pull a remote file to the local host.

% sftp user@remotehost:/remote/filename localfilename

Case 2: Creating a special sftp batch file containing the set of commands one wishes to execute with out any interaction.

% sftp -b batchfile user@remotehost

Additional information on constructing a batch file is available in the sftp man page.

2.3. rsync Over SSH (preferred)

rsync is an extremely powerful program; it can synchronize entire directory trees, only sending data about files that have changed. That said, it is rather picky about the way it is used. The rsync man page has a great deal of useful information, but the basics are explained below.

Single File Synchronization

To synchronize a single file via rsync, use the following:

To send a file:

% rsync --rsh=ssh --archive --stats --progress localfile \
        username@remotehost:/destination/dir/or/filename

To receive a file:

% rsync --rsh=ssh --archive --stats --progress \
        username@remotehost:/remote/filename localfilename

Note that --rsh=ssh is not necessary with newer versions of rsync, but older installs will default to using rsh (which is not generally enabled on modern OSes).

Directory Synchronization

To synchronize an entire directory, use the following:

To send a directory:

% rsync --rsh=ssh --archive --stats --progress localdir/ \
        username@remotehost:/destination/dir/

% rsync --rsh=ssh --archive --stats --progress localdir \
        username@remotehost:/destination

To receive a directory:

% rsync --rsh=ssh --archive --stats --progress \
        username@remotehost:/remote/directory/ /some/localdirectory/

% rsync --rsh=ssh --archive --stats --progress \
        username@remotehost:/remote/directory /some/

Note the difference with the slashes. The second command will place the files in the directory /destination/localdir; the fourth will place them in the directory /some/directory. rsync is very particular about the placement of slashes. Before running any significant rsync command, add --dry-run to the parameters. This will let rsync show you what it plans on doing without actually transferring the files.

Synchronization with Deletion

This is very dangerous; a single mistyped character may blow away all of your data. Do not synchronize with deletion if you aren't absolutely certain you know what you're doing.

To have directory synchronization delete files on the destination system that don't exist on the source system:

% rsync --rsh=ssh --archive --stats --dry-run --progress \
        --delete localdir/ username@remotehost:/destination/dir/

Note that the above command will not actually delete (or transfer) anything; the --dry-run must be removed from the list of parameters to actually have it work.

2.4. BBCP

Using BBCP to transfer large data files without encryption.

% bbcp [opt] user@source:/path/to/data user@destination:/path/to/store/data

Possible options include:

-P 2: Give a progress report every 2 seconds
- w 2M: TCP window size of 2MBytes
-s 16: Set the number of streams to 16 (default is 4)

Other options may be necessary if bbcp is not installed in a regular location on either end of the transfer. This can lead to rather complex command lines:

$ bbcp -z -T \
  "ssh -x -a -oFallBackToRsh=no %I -l %U %H /home/user/Custom/bin/bbcp" \
  foobar-5.4.14.tbz "ruser@10.20.30.40:foo.tbz"

2.5 Client Software

scp and sftp

Standard Clients

The command-line scp and sftp tools come with any modern distribution of OpenSSH; this is generally installed by default on modern Linux, UNIX, and Mac OS X installs.

Windows Clients

Windows clients include:

(puTTY-related command line utilities), and

scp, sftp, & rsync as provided by Cygwin.

*** VERY IMPORTANT ***: if you use Filezilla, please use the Site Manager feature (under "File") to manage the profile of the cluster you use. In the "Transfer Settings" tab, make sure that the "Limit number of simultaneous comments" box is checked and the "Maximum number of connections" is set to 1. Failing to do so may result in Filezilla creating excessive ssh connections, which could lead the suspension of your user account.

3. Computing Environment.

3.1. Shell

SuperMIC's default shell is bash. Other shells are available: sh, csh, tcsh, and ksh. Users may change their default shell by logging into their HPC Profile page at https://accounts.hpc.lsu.edu.

3.2. Modules

SuperMIC makes use of modules to allow for adding software to the user's environment.

The following is a guide to managing your software environment with modules.

The Environment Modules package provides for dynamic modification of your shell environment. Module commands set, change, or delete environment variables, typically in support of a particular application. They also let the user choose between different versions of the same software or different combinations of related codes. Complete documentation is available in the module(1) and modulefile(4) manpages.

3.2.1. Default Environment

The default environment is defined in the .modules file under each user's home directory. Edit this file if you would like to change the default environment.

3.2.2. Useful Module Commands

Command	Description
module list	List the modules that are currently loaded
module avail	List the modules that are available
module display <module name>	Show the environment variables used by <module name> and how they are affected
module unload <module name>	Remove <module name> from the environment
module load <module name>	Load <module name> into the environment
module swap <module one> <module two>	Replace <module one> with <module two> in the environment

3.2.3. Loading and unloading modules

You must remove some modules before loading others. Some modules depend on others, so they may be loaded or unloaded as a consequence of another module command. For example, if intel and mvapich are both loaded, running the command module unload intel will automatically unload mvapich. Subsequently issuing the module load intel command does not automatically reload mvapich.

4. File Systems

File System Summary
File system name	Access point	Type of file system	Quota	Time until purged	Best for
Home	/home/<your user name>	NFS	5 GB	Never	Code in development, compiled executables
Work (scratch)	/work/<your user name>	Lustre	Unlimited	60 days	Job input/output
Project	/project/<your user name>	Lustre	Varies	12 months, can be longer upon renewal	Storage space for a specific project (NOT meant for archival purposes)

User-owned storage on the SuperMIC system is available in two directories: home (/home/<your user name>) and work (/work/<your user name>). These directories are on separate file systems, and accessible from any node in the system. The work directory is created automatically within an hour of first login. If your work directory does not exist when you login, please wait at least an hour before contacting the HPC helpdesk.

4.1. Home Directory

The /home file system quota on SuperMIC is 5 GB. Files can be stored on /home permanently, which makes it an ideal place for your source code and executables. The /home file system is meant for interactive use such as editing and active code development. Do not use /home for batch job I/O.

4.2. Work (Scratch) Directory

The /work (/scratch) directories are created automatically once an hour after first login. The /work volume is meant for the input and output of executing batch jobs and not for long term storage. We expect files to be moved off to other locations or deleted in a timely manner, usually within 30-120 days. For performance reasons, our policy on all volumes is to limit the number of files per directory to around 10,000 and total number files to about 500,000.

The /work file system quota on SuperMIC is unlimited. If it becomes over utilized we will enforce a purge policy, which means that we will begin deleting files starting with the oldest last accessed date, and largest files, and continue until the volume has been reduced below 80%. An email message will be sent out weekly to users who may have files subject to purge informing them of their /work utilization. If diskspace should become critically low, more drastic measures may be required to keep the system stable.

Please do not attempt to circumvent the removal process by manually changing file dates. The /work volume capacity is not unlimited, and attempts to circumvent the purge process may adversely affect others and lead to access restrictions to the /work volume or even the cluster.

4.3. Project Directory

The /project file system is a quota-controlled space granted via an allocation system that allows large amounts of space to be shared for periods of 12 months or longer. The process is similar to requesting an allocation of system units, but is granted in 100 GB units for 6 months at a time, subject to renewal and demand. Visit the Storage Policy page for more details on who may apply and its intended uses. Qualified individuals may apply for one on the Storage Allocation Request page.

4.4. Local Scratch (/var/scratch) Directory

Local scratch (/var/scratch) space is provided on all compute nodes, and is local to each node (i.e. files stored in /var/scratch cannot be accessed by other nodes). The size of this file system will vary from system to system, and possibly across nodes within a system. This is the preferred place to put any intermediate files required while a job is executing. Once the job ends, the files it stores in /var/scratch are subject to deletion. Users should not have any expectation that files will exist after a job terminates, and are expected to move the data from /var/scratch to their /work or /home directory as part of the clean up process in their job script.

5. Application Development

The Intel, GNU and Portland Group (PGI) C, C++ and Fortran compilers are installed on SuperMIC and they can be used to create OpenMP, MPI, hybrid and serial programs. The commands you should use to create each of these types of programs are shown in the table below.

Intel compilers are loaded by default, codes can be compiled according to the following chart:

Intel Compiler Commands
	Serial Codes	MPI Codes	OpenMP Codes	Hybrid Codes
Fortran	ifort	mpiifort	ifort -openmp	mpiifort -openmp
C	icc	mpiicc	icc -openmp	mpiicc -openmp
C++	icpc	mpiicpc	icpc -openmp	mpiicpc -openmp

GNU Compiler Commands
	Serial Codes	MPI Codes	OpenMP Codes	Hybrid Codes
Fortran	gfortran	mpif90	gfortran -fopenmp	mpif90 -fopenmp
C	gcc	mpicc	gcc -fopenmp	mpicc -fopenmp
C++	g++	mpiCC	g++ -fopenmp	mpiCC -fopenmp

PGI Compiler Commands
	Serial Codes	MPI Codes	OpenMP Codes	Hybrid Codes
Fortran	pgf90	mpif90	pgf90 -mp	mpif90 -mp
C	pgcc	mpicc	pgcc -mp	mpicc -mp
C++	pgCC	mpiCC	pgCC -mp	mpiCC -omp

Default MPI: mvapich2 2.0 compiled with Intel compiler version 14.0.2

To compile a serial program, the syntax is: <your choice of compiler> <compiler flags> <source file name> . For example, the command below compiles the source file mysource.f90 and generate the executble myexec.

$ ifort -o myexec mysource.f90

To compile a MPI program, the syntax is the same, except that one needs to replace the serial compiler with an MPI one listed in the table above:

$ mpif90 -o myexec_par my_parallel_source.f90

5.2. GPU Programming

CUDA Programming

NVIDIA's CUDA compiler and libraries are accessed by loading the CUDA module:

module load cuda

Use the nvcc compiler on the head node to compile code, and run executables on nodes with GPUs - one head node has GPUs. SuperMIC K20X's GPUs are compute capability 3.5 devices. When compiling your code, make sure to specify this level of capability with:

nvcc -arch=compute_35 -code=sm_35 ...

GPU nodes are accessible through the gpu queue for production work.

OpenACC Programming

OpenACC is the name of an application program interface (API) that uses a collection of compiler directives to accelerate applications that run on multicore and GPU systems. The OpenACC compiler directives specify regions of code that can be offloaded from a CPU to an attached accelerator. A quick reference guide is available here.

Currently, only the Portland Group compilers installed on SuperMIC can be used to compile C and Fortran code annotated with OpenACC directives.

To load the PGI compilers:

module load pgi

To compile a C code annotated with OpenACC directives:

pgcc -acc -ta=nvidia -Minfo=accel code.c -o code.exe

6. Running Applications

SuperMIC uses SLURM to manage user jobs. Whether you run in batch mode or interactively, you will access the compute nodes using the SLURM command as described below. Remember that computationally intensive jobs should be run only on the compute nodes and not the login nodes. More details on submitting jobs and SLURM commands can be found here.

6.1. Available Partitions (Queues) on SuperMIC

Below are the possible job queues to choose from:

single - Used for jobs that will only execute on a single node, i.e. nodes=1:ppn=1-20. It has a wallclock limit of 72 hours (3 days). Jobs in the single queue should not use more than 3GB memory per core. If applications require more memory, scale the number of cores (ppn) to the amount of memory required i.e. max memory available for jobs in single queue is 12GB for ppn=4.
workq - Used for jobs that will use at least one node, i.e. nodes>=1:ppn=20. Currently, this queue has a wallclock limit of 72 hours (3 days). Jobs in workq are not preemptable, which means that running jobs will not be disrupted before completion.
checkpt - Used for jobs that will use at least one node. Jobs in the checkpt queue can be preempted if needed.
bigmem - Used for jobs that want to use the 256 GB nodes. This queue has a wallclock limit of 72 hours (3 days).

Queue Name	Max Walltime	Max Nodes (per user)	Allowed Cores per Node
workq	72	86	20
checkpt	72	86	20
single	168	86	1-20
bigmem	72	3	28
v100	72	2	36

The available queues and actual limit settings can be verified by running the command:

sinfo -s

6.2. Job Submission

SLURM (Simple Linux Utility for Resource Management) is an open source, highly scalable cluster management and job scheduling system. It is used for managing job scheduling on new HPC and LONI clusters. It was originally created at the Livermore Computing Center, and has grown into a full-fledge open-source software backed up by a large community, commercially supported by the original developers, and installed in many of the Top-500 supercomputers.

Information about the following topics can be found here:

Submitting batch script (single node)
Submitting batch script (multiple nodes)
Submitting interactive jobs
Jobs Using GPUs
Commonly used SLURM Commands

Submitting batch script (single node)

To create a batch SLURM script, use your favorite editor (e.g. vi or emacs) to create a text file with both SLURM instructions and commands how to run your job. All SLURM directives (special instructions) are prefaced by the #SBATCH. Below is an example of a SLURM batch job script:

 #!/bin/bash
 #SBATCH -N 1               # request one node
 #SBATCH -t 2:00:00	        # request two hours
 #SBATCH -p single          # in single partition (queue)
 #SBATCH -A your_allocation_name
 #SBATCH -o slurm-%j.out-%N # optional, name of the stdout, using the job number (%j) and the hostname of the node (%N)
 #SBATCH -e slurm-%j.err-%N # optional, name of the stderr, using job and hostname values
 # below are job commands
 date

 # Set some handy environment variables.

 export HOME_DIR=/home/$USER/myjob
 export WORK_DIR=/work/$USER/myjob
 
 # Make sure the WORK_DIR exists:
 mkdir -p $WORK_DIR
 # Copy files, jump to WORK_DIR, and execute a program called "mydemo"
 cp $HOME_DIR/mydemo $WORK_DIR
 cd $WORK_DIR
 ./mydemo
 # Mark the time it finishes.
 date
 # exit the job
 exit 0

To submit the above job to the scheduler, save the above script as a text file, e.g., singlenode.sh, then use the below command to submit:

$ sbatch singlenode.sh

List of useful SLURM directives and their meaning:

#SBATCH -A allocationname: short for --account, charge jobs to your allocation named allocationname.
#SBATCH -J: short for --jobname, name of the job.
#SBATCH -n : short for --ntasks, number of tasks (CPU cores) to run job on. The memory limit for jobs is 4 GB of MEM per CPU core requested.
#SBATCH -N : short for --nodes, number of nodes on which to run.
#SBATCH -c : short for --ncpus-per-task, number of threads per process.
#SBATCH -p partition: short for --partition, submit job to the partition queue.
- Allowed values for partition: single, checkpt, workq, gpu, bigmem.
- Depending on cluster, addition partitions can be found via the sinfo command.
#SBATCH -t hh:mm:ss: short for --time, request resources to run job for hh hours, mm minutes and ss seconds.
#SBATCH -o filename.out: short for --output, write standard output to file filename.out.
#SBATCH -e filename.err: short for --error, write standard error to file filename.err.
- Note that by default, SLURM will merge stardard error and standard output to one file if no "-o" or "-e" flag is set.
#SBATCH --mail-user your@email.address: Address to send email to when the --mail-type directive below is trigerred.
#SBATCH --mail-type type: Send an email after job status typeoccurs. Common values for type include BEGIN, END, FAIL or ALL. The arguments can be combined, for e.g. BEGIN, END will send email when job begins and ends

List of common useful SLURM environmental variables and their meaning:

SLURM_JOBID: Job ID number given to this job
SLURM_JOB_NODELIST: List of nodes allocated to the job
SLURM_SUBMIT_DIR: Directory where the sbatch command was executed
SLURM_NNODES: Total number of nodes in the job's resource allocation.
SLURM_NTASKS: Total number of CPU cores requested in a job.

Submitting batch script (multiple nodes)

Creating multiple-node job script is very similar to the single node job script, with the difference of using multiple nodes. Below is an example of a multiple-node batch job script:

 #!/bin/bash
 #SBATCH -N 2                	# request two nodes
 #SBATCH -n 16 		       	# specify 16 MPI processes (8 per node)
 #SBATCH -c 6			# specify 6 threads per process
 #SBATCH -t 2:00:00
 #SBATCH -p checkpt
 #SBATCH -A your_allocation_name
 #SBATCH -o slurm-%j.out-%N # optional, name of the stdout, using the job number (%j) and the first node (%N)
 #SBATCH -e slurm-%j.err-%N # optional, name of the stderr, using job and first node values
 # below are job commands
 date

 # Set some handy environment variables.

 export HOME_DIR=/home/$USER/myjob
 export WORK_DIR=/work/$USER/myjob
 
 # load appropriate modules, in this case Intel compilers, MPICH
 module load mpich/3.1.4/INTEL-15.0.3
 # Make sure the WORK_DIR exists:
 mkdir -p $WORK_DIR
 # Copy files, jump to WORK_DIR, and execute a program called "my_mpi_demo"
 cp $HOME_DIR/mydemo $WORK_DIR
 cd $WORK_DIR
 srun -N2 -n8 -c6 /my_mpi_demo # Launch the MPI application with two nodes, 8 MPI processes each node, and 6 threads per MPI process.
 # Mark the time it finishes.
 date
 # exit the job
 exit 0

Note: in the examples above, the srun command is used to launch the MPI application. This will be the default behavior.

The syntax for the srun command is:

srun <flags> <name of the MPI executable>

Some useful flags are:

-N: number of nodes
-n: total number of MPI processes
-c: number of threads per MPI process
-u: turn on unbuffered output (the output from MPI processes will be flushed to stdout as soon as it's generate); without this flag, Slurm will buffer and rearrange the output according to the MPI ranks.

Submitting interactive jobs

To start an interactive job, use the salloc command similar to the example below:

 salloc -t 1:00:00 -n8 -N1 -A your_allocation_name -p single

Similar to the batch job script, the -n denotes 8 tasks (cores), the -N denotes 1 compute node. The complete form of the above command can be:

 salloc --time=1:00:00 --ntasks=8 --nodes=1 --account=your_allocation_name --partition=single

Note:

If an interactive job session is submitted to a partition other than single, the -n or --ntasks flag will be ignored and one or more entire nodes will be allocated to the job.
Our recommendation is to specify your allocation name (-A your_allocation_name) to the salloc command so a proper allocation can be used by the scueduler.

Jobs Using GPUs

For jobs using GPUs, the number of GPU devices must be explicitly specified using the “--gres=gpu:” flag.

Requesting One GPU

If a job cannot use multiple GPU devices efficiently or if running a test job, a user should request one GPU. In this case, The job will share a node with other jobs.

For an interactive session requesting one GPU:

salloc -t hh:mm:ss -N1 -n16 --gres=gpu:1 -p gpu_partition_name -A your_allocation_name

For a batch job requesting one GPU:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 16
#SBATCH -t hh:mm:ss
#SBATCH -p gpu_partition_name
#SBATCH --gres=gpu:1
#SBATCH -A your_allocation_name

commands to run

Please note that the valid values for the number of tasks ("-n") is between 1 and (total number of CPU cores on the node)/(total number of GPUs on the node). For instance, if a job request one GPU on a node with 64 cores and 4 GPUs, the valid value for "-n" is from 1 to 64/4=16.

Requesting More Than One GPU (But Less Than A Node)

Users can request more than one GPU on a node (e.g. 2 or 3 GPUs on a node with 4 GPUs). In this case, The job will also share a node with other jobs.

For an interactive session requesting multiple GPUs:

salloc -t hh:mm:ss -N1 -n32 --gres=gpu:2 -p gpu_partition_name -A your_allocation_name

For a batch job requesting multiple GPUs:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 32
#SBATCH -t hh:mm:ss
#SBATCH -p gpu_partition_name
#SBATCH --gres=gpu:2
#SBATCH -A your_allocation_name

commands to run

Please note that the valid values for the number of tasks ("-n") is between 1 and (number of GPU requested)*(total number of CPU cores on the node)/(total number of GPUs on the node). For instance, if a job request 2 GPUs on a node with 64 cores and 4 GPUs, the valid value for "-n" is from 1 to 2*64/4=32.

Requesting One GPU Node With All Its GPUs

For an interactive session requesting one GPU node with all its GPUs (either 2 or 4, depending on the node configuration):

salloc -t hh:mm:ss -N1 -n64 --gres=gpu:number_of_gpus -p gpu_partition_name -A your_allocation_name

For a batch job requesting one GPU node and all its GPUs:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 64
#SBATCH -t hh:mm:ss
#SBATCH -p gpu_partition_name
#SBATCH --gres=gpu:number_of_gpus
#SBATCH -A your_allocation_name

commands to run

Requesting Multiple GPU Nodes With All Their GPUs

For an interactive session requesting multiple GPU nodes with all their GPUs (either 2 or 4, depending on the node configuration):

salloc -t hh:mm:ss -N number_of_gpu_nodes --gres=gpu:number_of_gpus -p gpu_partition_name -A your_allocation_name

For a batch job requesting multiple GPU nodes with all their GPUs:

#!/bin/bash
#SBATCH -N number_of_gpus_nodes
#SBATCH -t hh:mm:ss
#SBATCH -p gpu_partition_name
#SBATCH --gres=gpu:number_of_gpus
#SBATCH -A your_allocation_name

commands to run

Please note that, the value of "number_of_gpus" is the number of GPUs PER NODE, not the total number of GPUs that will be allocated to the job. For instance, when requesting 2 nodes with 4 GPUs on each node, the flag should be "--gres=gpu:4".

Commonly used SLURM Commands

squeue is used to show the partition (queue) status. Useful options:
- -l ("l" for "long"): gives more verbose information
- -u someusername: limit output to jobs by username --state=pending: limit output to pending (i.e. queued) jobs --state=running: limit output to running jobs
Below is an example to query all jobs submitted by current user (fchen14)
```
[fchen14@smic2 ~]$ squeue -u $USER
     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       340   checkpt     bash  fchen14  R    1:06:59      1 smic002
       339   checkpt     bash  fchen14  R    1:07:09      1 smic001
```

sinfo is used to view information about SLURM nodes and partitions. Typical usage:

[fchen14@smic001 test]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug        up   infinite      3   idle smic[026-027,032]
checkpt*     up 3-00:00:00      2  alloc smic[001-002]
checkpt*     up 3-00:00:00     23   idle smic[003-025]
single       up 7-00:00:00      2  alloc smic[001-002]
single       up 7-00:00:00     23   idle smic[003-025]
bigmem       up 7-00:00:00      2   idle smic[033-034]

scancel is used to signal or cancel jobs. Typical usage with squeue:

[fchen14@smic1 ~]$ squeue -u fchen14
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               341   checkpt     bash  fchen14  R       0:13      1 smic001
               340   checkpt     bash  fchen14  R    1:50:57      1 smic002
# cancel (delete) job with JOBID 340			   
[fchen14@smic1 ~]$ scancel 340
# job status might display a temporary "CG" ("CompletinG") status immediately after scancel
[fchen14@smic1 ~]$ squeue -u fchen14 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               340   checkpt     bash  fchen14 CG    1:51:08      1 smic002
               341   checkpt     bash  fchen14  R       0:41      1 smic001
[fchen14@smic1 ~]$ squeue -u fchen14 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               341   checkpt     bash  fchen14  R       1:08      1 smic001

scontrol is used to view or modify SLURM configuration and state. Typical usage for the user is to check job status:

[fchen14@smic1 ~]$ squeue -u fchen14 # show all jobs
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               341   checkpt     bash  fchen14  R    1:29:20      1 smic001
[fchen14@smic1 ~]$ scontrol show job 341
JobId=341 JobName=bash
   UserId=fchen14(32584) GroupId=Admins(10000) MCS_label=N/A
   Priority=1 Nice=0 Account=hpc_hpcadmin6 QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=01:29:31 TimeLimit=12:00:00 TimeMin=N/A
   SubmitTime=2020-05-07T10:47:52 EligibleTime=2020-05-07T10:47:52
   AccrueTime=Unknown
   StartTime=2020-05-07T10:47:52 EndTime=2020-05-07T22:47:57 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-05-07T10:47:52
   Partition=checkpt AllocNode:Sid=smic1:28374
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=smic001
   BatchHost=smic001
   NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=8,mem=22332M,node=1,billing=8
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=22332M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
   Command=/bin/bash
   WorkDir=/home/fchen14/test
   Power=

More detailed information on the SLURM commands to schedule and monitor jobs can be found at Slurm on-line documentation.

h3

h4