Cluster Usage Guide¶
System Access to Clusters¶
Transferring files to and from clusters¶
Computing Environment¶
Shell¶
The default shell is bash. Other shells are available: sh, csh, tcsh, and ksh.
Users may change their default shell via the HPC Profile page.
Modules¶
Guide to managing your software environment with modules.
File Systems¶
Cluster specific file system policy and storage quota information
Application Development¶
Running Applications¶
HPC clusters use Slurm to manage user jobs. Whether you run in batch mode or interactively, you will access the compute nodes using the Slurm command as described below.
Remember that computationally intensive jobs should be run only on the compute nodes and not the login nodes.
View the official cheat sheet for an overview of available commands.
Convert PBS or other job schedulers to Slurm commands.
Generally Available Partitions¶
Below are the available job partitions (previously known as queues) to choose from:
- single - Used for jobs running on a single node while using fewer than 64 cores, i.g.,
--nodes=1 with --ntasks from 1 to 63. It has a time limit of 168 hours (7 days). Jobs in the single partition should not use more than MEMORY/NCPUS memory per core. If applications require more memory, scale the number of cores (ppn) to the amount of memory required - workq - Used for jobs that will use at least one node, i.e.
--nodes=2or more and where all cores will be allocated. Currently, this partition has a time limit of 72 hours (3 days). Jobs in workq are not preemptible, which means that running jobs will not be disrupted before completion. - checkpt - Same as workq but jobs in the checkpt partition can be preempted if needed.
- bigmem - Used for jobs requiring 2 TB memory nodes. This partition has a timelimit limit of 72 hours (3 days).
- gpu - Used for jobs that run applications capable of utilizing the NVIDIA GPUs.
View our cluster specific partition details.
The current partition and limit settings can be verified by running:
sinfo -s
scontrol show partition
sacctmgr show qos format=name,MaxSubmitJob,maxtresperuser,maxjobsperuser,maxsubmitjobsperuser,maxtresperaccount,grptres
Job Submission¶
Slurm (Simple Linux Utility for Resource Management) is an open source, highly scalable cluster management and job scheduling system. It is used for managing job scheduling on new HPC and LONI clusters. It was originally created at the Livermore Computing Center, and has grown into a full-fledged open-source software backed up by a large community, commercially supported by the original developers, and installed in many of the Top-500 supercomputers.
Information about the following topics can be found here: