Cluster Usage Guide

System Access to Clusters

Login guide

Transferring files to and from clusters

File transfer guide

Computing Environment

Shell

The default shell is bash. Other shells are available: sh, csh, tcsh, and ksh.

Users may change their default shell via the HPC Profile page.

Modules

Guide to managing your software environment with modules.

File Systems

Cluster specific file system policy and storage quota information

Application Development

Application Development guide

GPU Programming guide

Running Applications

HPC clusters use Slurm to manage user jobs. Whether you run in batch mode or interactively, you will access the compute nodes using the Slurm command as described below.

Remember that computationally intensive jobs should be run only on the compute nodes and not the login nodes.

Learn how to submit jobs.

View the official cheat sheet for an overview of available commands.

Convert PBS or other job schedulers to Slurm commands.

Generally Available Partitions

Below are the available job partitions (previously known as queues) to choose from:

single - Used for jobs running on a single node while using fewer than 64 cores, i.g., --nodes=1 with --ntasks from 1 to 63. It has a time limit of 168 hours (7 days). Jobs in the single partition should not use more than MEMORY/NCPUS memory per core. If applications require more memory, scale the number of cores (ppn) to the amount of memory required
workq - Used for jobs that will use at least one node, i.e. --nodes=2 or more and where all cores will be allocated. Currently, this partition has a time limit of 72 hours (3 days). Jobs in workq are not preemptible, which means that running jobs will not be disrupted before completion.
checkpt - Same as workq but jobs in the checkpt partition can be preempted if needed.
bigmem - Used for jobs requiring 2 TB memory nodes. This partition has a timelimit limit of 72 hours (3 days).
gpu - Used for jobs that run applications capable of utilizing the NVIDIA GPUs.

View our cluster specific partition details.

The current partition and limit settings can be verified by running:

sinfo -s
scontrol show partition
sacctmgr show qos format=name,MaxSubmitJob,maxtresperuser,maxjobsperuser,maxsubmitjobsperuser,maxtresperaccount,grptres

Job Submission

Slurm (Simple Linux Utility for Resource Management) is an open source, highly scalable cluster management and job scheduling system. It is used for managing job scheduling on new HPC and LONI clusters. It was originally created at the Livermore Computing Center, and has grown into a full-fledged open-source software backed up by a large community, commercially supported by the original developers, and installed in many of the Top-500 supercomputers.

Information about the following topics can be found here: