SuperMike-II
▶ Table of Contents
SuperMike-II, named after LSU's original large Linux cluster named SuperMike that was launched in 2002, is 10 times faster than its immediate predecessor, Tezpur.
SuperMike-II is a 146 TFlops Peak Performance 440 compute node cluster running the Red Hat Enterprise Linux 6 operating system. Each node contains two 8-Core Sandy Bridge Xeon 64-bit processors operating at a core frequency of 2.6 GHz. Fifty of the compute nodes also have two NVIDIA M2090 GPUs that provide an additional 66 Tflops total Peak performance.
SuperMike-II is open for general use to LSU users. A user guide is available here.
- 2 Interactive Nodes
- Two 2.6 GHz 8-Core Sandy Bridge Xeon 64-bit Processors
- 64GB 1666MHz Ram
- 500GB HD
- 40 Gigabit/sec Infiniband network interface
- 1 Gigabit Ethernet network interface
- Red Hat Enterprise Linux 6
- 382 Compute Nodes
- Two 2.6 GHz 8-Core Sandy Bridge Xeon 64-bit Processors
- 32GB 1666MHz Ram
- 500GB HD
- 40 Gigabit/sec Infiniband network interface
- 1 Gigabit Ethernet network interface
- Red Hat Enterprise Linux 6
- 50 Compute Nodes
- Two 2.6 GHz 8-Core Sandy Bridge Xeon 64-bit Processors
- Two NVIDIA M2090 GPUs
- 64GB 1666MHz Ram
- 500GB HD
- 40 Gigabit/sec Infiniband network interface
- 1 Gigabit Ethernet network interface
- Red Hat Enterprise Linux 6
- 8 Compute Nodes
- Two 2.6 GHz 8-Core Sandy Bridge Xeon 64-bit Processors
- 256GB 1666MHz Ram
- 500GB HD
- 40 Gigabit/sec Infiniband network interface
- 1 Gigabit Ethernet network interface
- Red Hat Enterprise Linux 6
- Cluster Storage
- 400 TB DDN Lustre High-Performance disk
- 2 TB NFS-mounted /home disk storage
1. Access to SuperMike-II
*nix and Mac Users - One would issue a command similar to the following:
$ ssh -X username@mike.hpc.lsu.edu
The user would then be prompted for his password. The -X flags allow for X11 Forwarding to be set up automatically.
Windows Users - For a Windows client please use the PuTTY utility.
If you have forgotten your password, or you wish to reset it, see here (click "Forgot your password?").
2. User Environment.
SuperMike-II makes use of softenv to allow for adding software to the user's environment. Executing softenv on a cluster will display a lisf of the available software:
$ softenv +ImageMagick-6.4.6.9-intel-11.1 +ParMetis-3.1.1-intel-11.1-mpich-1.2.7p1 +R-2.8.1-gcc-4.3.2 ...
In order to add software to your environment, you'll need to add the appropriate key to your ~/.soft file. For example, to add the package ImageMagick to your user environment, you would need to add the following:
$ cat ~/.soft +ImageMagick-6.4.6.9-intel-11.1 @default
The order in which you add keys to ~/.soft is important. The first occurrence of a setting takes presedence.
Once the entries are to your liking, you must then execute the command resoft, i.e.:
$ resoft
If your code needs to link to a library of given package, you will find all software installed under /usr/local/packages/, e.g.:
$ ls /usr/local/packages/ apache_ant boost fuse gold hdf5 ... arpack boostjam gamess graphviz hypre atlas condor git gromacs ImageMagick blacs fftw gnuplot gsl iozone
3. File Storage
3.1. Home Directory
The /home file system quota on SuperMike-II is 5 GB. Files can be stored on /home permanently, which makes it an ideal place for your source code and executables. The /home file system is meant for interactive use such as editing and active code development. Do not use /home for batch job I/O.
3.2. Work (Scratch) Directory
The /work volume meant for the input and output of executing batch jobs and not for long term storage. We expect files to be copied to other locations or deleted in a timely manner, usually within 30-120 days. For performance reasons on all volumes, our policy is to limit the number of files per directory to around 10,000 and total files to about 500,000.
The /work file system quota on SuperMike-II is unlimited. If it becomes over utilized we will enforce a 30 days purging policy, which means that any files that have not been accessed for the last 30 days will be permanently deleted. An email message will be sent out weekly to users targeted for a purge informing them of their /work utilization.
Please do not try to circumvent the removal process by date changing methods. We expect most files over 30 days old to disappear. If you try to circumvent the purge process, this may lead to access restrictions to the /work volume or the cluster.
Please note that the /work volume is not unlimited. Please limit your usage rate to a reasonable amount. When the utilization of /work is over 80%, a 14 day purge may be performed on users using more than 2 TB or having more than 500,000 files. Should disk space become critically low, all files not accessed in 14 days will be purged or even more drastic measures if needed. Users using the largest portions of the /work volume will be contacted when problems arise and they will be expected to take action to help resolve issues.
4. Programming/Compiling
Version 11.1 of the Intel compilers are loaded by default, codes can be compiled according to the following chart:
| Serial Codes | MPI Codes | OpenMP Codes | Hybrid Codes | |
|---|---|---|---|---|
| Fortran | ifort | mpif90 | ifort -openmp | mpif90 -openmp |
| C | icc | mpicc | icc -openmp | mpicc -openmp |
| C++ | icpc | mpiCC | icpc -openmp | mpiCC -openmp |
Default MPI: openmpi 1.6.2 compiled with intel 13.0.0
5. Running Jobs
Below are the possible job queues to choose from:
- single - Used for jobs that will only execute on a single node, i.e. nodes=1:ppn<=16.
- workq - Used for jobs that will use at least one node, i.e. nodes>=1:ppn=16. Currently, this queue has a limit of 72 hours (3 days) of wallclock time.
- checkpt - Used for jobs that will use at least one node.
- bigmem - This queue has a limit of 48 hours (2 days) of wallclock time.
- bigmemtb - Used for jobs that require large memory upto 1TB. This queue has a limit of 48 hours (2 days) of wallclock time.
- gpu - description for gpu queue
- lasigma - Used for jobs charged to the hpc_lasigma allocation, i.e. -A hpc_lasigma. The lasigma queue provides access to 28 compute nodes with two NVIDIA 2090 GPUs each and has a limit of 72 hours (3 days) of wallclock time. You cannot charge jobs run on other queues to the hpc_lasigma allocation.
- mwfa - Used for jobs charged to the hpc_mwfa allocation, i.e. -A hpc_mwfa. The mwfa queue provides access to 8 compute nodes and has a limit of 72 hours (3 days) of wallclock time. You cannot charge jobs run on other queues to the hpc_mwfa allocation.
| Queue Name | Max Walltime | Max Nodes (per job) |
|---|---|---|
| workq | 72 | 128 |
| checkpt | 72 | 200 |
| single | 72 | 1 |
| gpu | 24 | 16 |
| bigmem | 48 | 2 |
| bigmemtb | 48 | 1 |
| lasigma | 72 | 28 |
| mwfa | 72 | 8 |
Single Queue Job Script Template
$ cat ~/script #!/bin/bash #PBS -q single #PBS -l nodes=1:ppn=1 #PBS -l walltime=HH:MM:SS #PBS -o desired_output_file_name #PBS -N NAME_OF_JOB /path/to/your/executable
Workq Queue Job Script Template
$ cat ~/script #!/bin/bash #PBS -q workq #PBS -l nodes=1:ppn=16 #PBS -l walltime=HH:MM:SS #PBS -o desired_output_file_name #PBS -j oe #PBS -N NAME_OF_JOB # mpi jobs would execute: # mpirun -np 16 -machinefile $PBS_NODEFILE /path/to/your/executable # OpenMP jobs would execute: # export OMP_NUM_THREADS=16; /path/to/your/executable
Checkpt Queue Job Script Template
$ cat ~/script #!/bin/bash #PBS -q checkpt #PBS -l nodes=1:ppn=16 #PBS -l walltime=HH:MM:SS #PBS -o desired_output_file_name #PBS -j oe #PBS -N NAME_OF_JOB # mpi jobs would execute: # mpirun -np 16 -machinefile $PBS_NODEFILE /path/to/your/executable # OpenMP jobs would execute: # export OMP_NUM_THREADS=16; /path/to/your/executable
Bigmem Queue Job Script Template
$ cat ~/script #!/bin/bash #PBS -q bigmem # Request to use a 48 GB node, similarly could request mem96 #PBS -l nodes=1:ppn=16:mem48 #PBS -l walltime=HH:MM:SS #PBS -o desired_output_file_name #PBS -j oe #PBS -N NAME_OF_JOB # mpi jobs would execute: # mpirun -np 8 -machinefile $PBS_NODEFILE /path/to/your/executable # OpenMP jobs would execute: # export OMP_NUM_THREADS=8; /path/to/your/executable
Submit the job by executing:
$ qsub script
6. Monitoring Jobs
The following commands can be used to view/modify the queue
- qdel jobid - deletes a PBS job in the queue.
- qstat - shows you the status of your job and the jobs of others in the queue. It can show you various other bits of information about your job as well, such as the number of nodes it intends to use, the name of the queue it's in, etc
- showq - displays jobs info within the batch system.
- showstart jobid - gives an estimated starting time for your job.
- checkjob jobid - displays detailed job state information
More detailed information on the Torque PBS commands and Moab to schedule and monitor jobs can be found at Adaptive Computing on-line documentations.