Eric
▶ Table of Contents
Eric, named after the first Boyd Professor (A Boyd Professorship is the highest and most prestigious academic rank LSU can confer on a professor) at LSU -- political science professor Eric Vogelin, is a 4.77 TFlops Peak Performance 128 compute node cluster running the Red Hat Enterprise Linux 4 operating system. Each node contains two Dual Core Xeon 64-bit processors operating at a core frequency of 2.33 GHz. Eric is a LONI's Dell Linux cluster housed in the Coates Hall at LSU.
- 128 Compute Nodes
- Two 2.33 GHz Dual Core Xeon 64-bit Processors
- 4 GB Ram
- 10 Gb/sec Infiniband network interface
- 10/100/1000 Ethernet network interface
- Red Hat Enterprise Linux 4
- 1 Interactive Node
- Two 3.00 GHz Dual Core Xeon 64-bit Processors
- 8 GB Ram
- 10/100/1000 Ethernet network interface
- Red Hat Enterprise Linux 4
- Cluster Storage
- 2.3 TB of local storage
- 12 TB Lustre filesystem
1. Access to Eric
*nix and Mac Users - One would issue a command similar to the following:
$ ssh -X username@eric.loni.org
The user would then be prompted for his password. The -X flags allow for X11 Forwarding to be set up automatically.
Windows Users - For a Windows client please use the PuTTY utility.
If you have forgotten your password, or you wish to reset it, see here (click "Forgot your password?").
2. User Environment.
Eric makes use of softenv to allow for adding software to the user's environment. Executing softenv on a cluster will display a lisf of the available software:
$ softenv +ImageMagick-6.4.6.9-intel-11.1 +ParMetis-3.1.1-intel-11.1-mpich-1.2.7p1 +R-2.8.1-gcc-4.3.2 ...
In order to add software to your environment, you'll need to add the appropriate key to your ~/.soft file. For example, to add the package ImageMagick to your user environment, you would need to add the following:
$ cat ~/.soft +ImageMagick-6.4.6.9-intel-11.1 @default
The order in which you add keys to ~/.soft is important. The first occurrence of a setting takes presedence.
Once the entries are to your liking, you must then execute the command resoft, i.e.:
$ resoft
If your code needs to link to a library of given package, you will find all software installed under /usr/local/packages/, e.g.:
$ ls /usr/local/packages/ apache_ant boost fuse gold hdf5 ... arpack boostjam gamess graphviz hypre atlas condor git gromacs ImageMagick blacs fftw gnuplot gsl iozone
3. File Storage
3.1. Home Directory
The /home file system quota on Eric is 5 GB. Files can be stored on /home permanently, which makes it an ideal place for your source code and executables. The /home file system is meant for interactive use such as editing and active code development. Do not use /home for batch job I/O.
3.2. Work (Scratch) Directory
The /work volume meant for the input and output of executing batch jobs and not for long term storage. We expect files to be copied to other locations or deleted in a timely manner, usually within 30-120 days. For performance reasons on all volumes, our policy is to limit the number of files per directory to around 10,000 and total files to about 500,000.
The /work file system quota on Eric is 100 GB. If it becomes over utilized we will enforce a 30 days purging policy, which means that any files that have not been accessed for the last 30 days will be permanently deleted. An email message will be sent out weekly to users targeted for a purge informing them of their /work utilization.
Please do not try to circumvent the removal process by date changing methods. We expect most files over 30 days old to disappear. If you try to circumvent the purge process, this may lead to access restrictions to the /work volume or the cluster.
Please note that the /work volume is not unlimited. Please limit your usage rate to a reasonable amount. When the utilization of /work is over 80%, a 14 day purge may be performed on users using more than 2 TB or having more than 500,000 files. Should disk space become critically low, all files not accessed in 14 days will be purged or even more drastic measures if needed. Users using the largest portions of the /work volume will be contacted when problems arise and they will be expected to take action to help resolve issues.
4. Programming/Compiling
Version 11.1 of the Intel compilers are loaded by default, codes can be compiled according to the following chart:
| Serial Codes | MPI Codes | OpenMP Codes | Hybrid Codes | |
|---|---|---|---|---|
| Fortran | ifort | mpif90 | ifort -openmp | mpif90 -openmp |
| C | icc | mpicc | icc -openmp | mpicc -openmp |
| C++ | icpc | mpiCC | icpc -openmp | mpiCC -openmp |
Default MPI: mvapich 1.1 compiled with intel 11.1
5. Running Jobs
Below are the possible job queues to choose from:
- single - Used for jobs that will only execute on a single node, i.e. nodes=1:ppn<=4.
- workq - Used for jobs that will use at least one node, i.e. nodes>=1:ppn=4. Currently, this queue has a limit of 72 hours (3 days) of wallclock time.
- checkpt - Used for jobs that will use at least one node.
| Queue Name | Max Walltime | Max Nodes (per job) |
|---|---|---|
| workq | 72 | 24 |
| checkpt | 72 | 48 |
| single | 72 | 1 |
Single Queue Job Script Template
$ cat ~/script #!/bin/bash #PBS -q single #PBS -l nodes=1:ppn=1 #PBS -l walltime=HH:MM:SS #PBS -o desired_output_file_name #PBS -N NAME_OF_JOB /path/to/your/executable
Workq Queue Job Script Template
$ cat ~/script #!/bin/bash #PBS -q workq #PBS -l nodes=1:ppn=4 #PBS -l walltime=HH:MM:SS #PBS -o desired_output_file_name #PBS -j oe #PBS -N NAME_OF_JOB # mpi jobs would execute: # mpirun -np 4 -machinefile $PBS_NODEFILE /path/to/your/executable # OpenMP jobs would execute: # export OMP_NUM_THREADS=4; /path/to/your/executable
Checkpt Queue Job Script Template
$ cat ~/script #!/bin/bash #PBS -q checkpt #PBS -l nodes=1:ppn=4 #PBS -l walltime=HH:MM:SS #PBS -o desired_output_file_name #PBS -j oe #PBS -N NAME_OF_JOB # mpi jobs would execute: # mpirun -np 4 -machinefile $PBS_NODEFILE /path/to/your/executable # OpenMP jobs would execute: # export OMP_NUM_THREADS=4; /path/to/your/executable
6. Monitoring Jobs
The following commands can be used to view/modify the queue
- qdel jobid - deletes a PBS job in the queue.
- qstat - shows you the status of your job and the jobs of others in the queue. It can show you various other bits of information about your job as well, such as the number of nodes it intends to use, the name of the queue it's in, etc
- showq - displays jobs info within the batch system.
- showstart jobid - gives an estimated starting time for your job.
- checkjob jobid - displays detailed job state information
More detailed information on the Torque PBS commands and Moab to schedule and monitor jobs can be found at Adaptive Computing on-line documentations.