Common commands
Commonly Used Commands¶
More detailed information on the Slurm commands to schedule and monitor jobs can be found at Slurm online documentation.
List of common Slurm commands:
squeue is used to show the partition (queue) status. Useful options:
-l ("l" for "long"): gives more verbose information
-u someusername: limit output to jobs by username --state=pending: limit output to pending (i.e. queued) jobs --state=running: limit output to running jobs
Below is an example to query all jobs submitted by current user (fchen14)
$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
340 checkpt bash fchen14 R 1:06:59 1 mike002
339 checkpt bash fchen14 R 1:07:09 1 mike001
sinfo is used to view information about Slurm nodes and partitions. Typical usage:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up infinite 3 idle mike[026-027,032]
checkpt* up 3-00:00:00 2 alloc mike[001-002]
checkpt* up 3-00:00:00 23 idle mike[003-025]
single up 7-00:00:00 2 alloc mike[001-002]
single up 7-00:00:00 23 idle mike[003-025]
bigmem up 7-00:00:00 2 idle mike[033-034]
scancel is used to signal or cancel jobs. Typical usage with squeue:
$ squeue -u fchen14
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
341 checkpt bash fchen14 R 0:13 1 mike001
340 checkpt bash fchen14 R 1:50:57 1 mike002
# cancel (delete) job with JOBID 340
$ scancel 340
# job status might display a temporary "CG" ("CompletinG") status immediately after scancel
$ squeue -u fchen14
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
340 checkpt bash fchen14 CG 1:51:08 1 mike002
341 checkpt bash fchen14 R 0:41 1 mike001
$ squeue -u fchen14
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
341 checkpt bash fchen14 R 1:08 1 mike001
scontrol is used to view or modify Slurm configuration and state. Typical usage for the user is to check job status:
$ squeue -u fchen14 # show all jobs
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
341 checkpt bash fchen14 R 1:29:20 1 mike001
$ scontrol show job 341
JobId=341 JobName=bash
UserId=fchen14(32584) GroupId=Admins(10000) MCS_label=N/A
Priority=1 Nice=0 Account=hpc_hpcadmin6 QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=01:29:31 TimeLimit=12:00:00 TimeMin=N/A
SubmitTime=2020-05-07T10:47:52 EligibleTime=2020-05-07T10:47:52
AccrueTime=Unknown
StartTime=2020-05-07T10:47:52 EndTime=2020-05-07T22:47:57 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-05-07T10:47:52
Partition=checkpt AllocNode:Sid=mike1:28374
ReqNodeList=(null) ExcNodeList=(null)
NodeList=mike001
BatchHost=mike001
NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=8,mem=22332M,node=1,billing=8
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=22332M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
Command=/bin/bash
WorkDir=/home/fchen14/test
Power=
More detailed information on the Slurm commands to schedule and monitor jobs can be found at Slurm online documentation.