Pandora
▶ Table of Contents
Pandora is the new IBM POWER7 cluster at LSU meant to replace Pelican. Pelican has formally stopped accepting jobs as of 30th June 2012. Pandora is an eight node IBM pSeries (Power 7+) cluster running the AIX operating system.
1. Access to Pandora
The Pandora cluster can be accessed using the domain name pandora.lsu.edu.
Connecting to Pandora requires the use of client software that supports the Secure Shell (SSH) protocol. Pandora does not accept incoming connections from known insecure protocols such as ftp, telnet, or the "R" commands (rlogin, rsh, etc).
LSU HPC supports the OpenSSH BSD-licensed client, the SSH.COM commercial client, and the open source Windows client Putty.
1.1. Using a command-line SSH client
Example session using the OpenSSH command-line client:
$ ssh xuser@pandora.hpc.lsu.edu NOTICE: This is the LSU computer system, which may be accessed and used only by authorized persons. LSU reserves the right to review and/or monitor system transactions for compliance with its policies and/or applicable law. Upon reasonable cause, LSU may disclose such transactions to authorized persons for official purposes, including criminal and other investigations, and permit the monitoring of system transactions by law enforcement agencies. Access or use of this computer system by any person, whether authorized or unauthorized, constitutes consent to these terms. All users must login to the interactive node at PANDORA.HPC.LSU.EDU. xuser@pandora.hpc.lsu.edu's password: ******************************************************************************* * * * * * Welcome to Pandora, the new POWER7 cluster provided by HPC@LSU. * * * * Pandora is currently in friendly-user mode. Please let us know of any * * problems or concerns you encounter with using the cluster. Any and all * * jobs are welcome, but we reserve the right to reboot machines without * * prior notice as we work on bringing Pandora up to production status. * * * * NOTE: While Pandora has 128GiB of RAM per node, you should not request * * more than 3.9G per processor, to ensure that as many tasks as possible * * run on the same system. Use ConsumableMemory(3.9 GB) in your LoadLeveler * * requirements statement if you want close to 4GiB of RAM. * * * ******************************************************************************* My shell is /bin/bash -bash-3.2$
1.2. Using SSH.COM's Windows Client
For users connecting from a Microsoft Windows system using SSH.com's SSH client:
- select "Quick Connect" to display the "connect to remote host" dialogue
- Enter "pandora.lsu.edu" into the hostname field and a Pelican username into the userid field
- Enter your Pelican password when prompted
2. User Environment
Passwords
New users should change their initial password as soon as possible after receiving a Pandora account. The use of a strong password is highly encouraged. To reset you forgoten passord visit HPC password reset page
3. File Systems
3.1. Home Directory
Upon logging into Pandora, users are placed in their home directory. User home directories have a storage limit of 5GB via disk quotas. Users exceeding the 5GB limit will be unable to allocate any additional storage until removing preexisting data within their home space. A user's home directory is "global" within Pandora, available from every node. Home directories are backed up nightly allowing for the restoration of files that have been accidentally deleted or in the event of a disk failure.
3.2. Work Directory
In addition to individual home directories, Pandora users have access to the directory /work/default. Each user has a storage quota of 50GB within this filesystem.
Like the home directory space, files located under /work/default are backed up nightly. As with /home, the /work filesystem is global within Pandora, available on every node. Given the global scope, /work is a suitable space for the execution of user jobs. User's desiring to use work in such a way should change directory to /work/default/your_user_name that space:
-bash-3.2$ cd /work/default/$USER -bash-3.2$ pwd /work/default/bthakur
In an effort to maximize data storage, files stored within these directories are eligible to be transparently migrated to tape. The files will still exist on the disk in the form of a "stub" file that will respond normally to the typical Unix filesystem commands, such as ls, but attempts to access the data within the file will trigger a restore of the data from tape, increasing the access time significantly.
3.3. Scratch Directory
Each Pandora node possesses local disk space allocated to the /scratch filesystem that is used to temporarily store files during the execution of user jobs. As the /scratch filesystem is intended to store files related to a job during that job's execution, files written to it will be deleted following the completion of the job.
4. Software environment
Software on pandora is managed by softenv. To access system installed software, edit your ~/.soft and issue resoft command.
-bash-3.2$ cat ~/.soft # # This is your SoftEnv configuration run control file. # # It is used to tell SoftEnv how to customize your environment by # setting up variables such as PATH and MANPATH. To learn more # about this file, do a "man softenv". # +gromacs-4.5.4 @default
5. Compilers on Pandora (Fortran, C, C++)
The IBM compilers can be used to compile programs on Pandora.
The IBM compilers support the following file extensions
C source files: .c
C++ source files .C, .cc, .cp, .cpp, .cxx, .c++
Fortran source files .f, .F, f77, F77, .f90,
.F90, .f95, .F95, .f03, .F03
Preprocessed source files .i
Object files .o
Module symbol files .mod
Assembler files .s
Unpreprocessed assembler .S
Shared object or library .so
Archive or library files .a
Default executable a.out
Make dependency .d
Listing files .lst
5.1. Building serial executables
To compile serial programs using IBM compilers, use
$ xlc hello.c $ xlC hello.C $ xlf90 hello.f90
5.2. Building parallel executables
To compile MPI-parallel programs using IBM compilers, use
$ mpxlc hello.c $ mpxlC hello.C $ mpxlf90 hello.f90
Use thread safe compilers, use xl?_r for parallel programs
5.3. Optimization, debug and diagnostic options
To optimize, debug or print info using IBM compilers, use
-O[n], (n=0,2,3,4,5)
Levels of optimization include -O0, almost no optimization, best for debugging -O2, strong low-level optimization that benefits most programs -O3, intense low-level optimization analysis and base-level loop analysis -O4, all of -O3 plus detailed loop analysis and basic program analysis -O5, all of -O4 and detailed whole-program analysis at link time
-g, generate debug info
5.4. Other useful options
-qpic, instructs the compiler to generate Position-Independent Code -qlist, Generates an object listing -qreport, Instructs the HOT or IPA optimizer to emit a report -S, Invokes the disassembly tool
6. Running jobs on pandora
Pandora uses loadleveler to schedule user jobs. A user has to write a script, similar to a PBS script on Linux machines to request resources.
6.1. Submitting jobs on Pandora: Working With LoadLeveler
You can request computer resources to run your programs from the LoadLeveler system. LoadLeveler is a queueing system that distributes jobs, runs them and manages the resources within Pandora.
A typical loadleveler script looks like this
#!/bin/bash # .. put nothing before this header #@ job_type = parallel #@ notification = never #@ notify_user = youremail@domain.tdl #@ output = /work/username/$(jobid).out #@ error = /work/username/$(jobid).err #@ class = workq #@ checkpoint = no #@ wall_clock_limit = 2:00:00 #@ node_usage = shared #@ node = 2 #@ tasks_per_node = 32 #@ requirements = (Arch == "POWER7") #@ network.MPI_LAPI =sn_single,not_shared,US,HIGH #@ resources = ConsumableMemory(3500 mb) ConsumableCpus(1) #@ queue cd /working/directory poe executable options exit 0
The following table lists the queues (classes), and the maximum allowed wall clock time for each queue. LoadLeveler uses the term 'class' in the same way that most people use the term queue. Throughout this guide, and when working with LoadLeveler, you should consider the two terms to be interchangeable.
-bash-3.2$ llclass
Name MaxJobCPU MaxProcCPU Free Max Description
d+hh:mm:ss d+hh:mm:ss Slots Slots
--------------- -------------- -------------- ----- ----- ---------------------
interactive unlimited unlimited 8 8 Queue for interactive jobs;
maximum runtime of 30 mins.
workq unlimited unlimited 32 224 Standard queue for jobs;
maximum runtime of 3 days.
cheme unlimited unlimited 16 96 Queue for Chemical Engg.;
maximum runtime of 3 days.
single unlimited unlimited 0 64 Queue for single-node jobs;
maximum runtime of 3 days.
--------------------------------------------------------------------------------
Pandora uses Loadleveler to submit jobs. In a fashion similar to the linux machines, you will have to create a script to submit a job.
| Argument | Argument explaination |
|---|---|
| #@ error = errfile | Name of the error file |
| #@ output = outfile | Name of the output file |
| #@ job_name = my_first_job | Set the name of the job. |
| #@ job_type = parallel | Type of job you want to run, serial and parallel. |
| #@ tasks_per_node = number | Number of tasks per node |
| #@ wall_clock_limit = 10:00:00 | Limit for the elapsed time |
| #@ node_usage = shared | If you wish to share the node |
| #@ node = 2 | Number of nodes requested |
| #@ class = workq | Queue where to run the job |
| #@ requirements = (Arch == "POWER7") | Note POWER7 in capitals |
| #@ network.MPI_LAPI =sn_single,not_shared,US,HIGH | Network |
| #@ resources = ConsumableMemory(3500 mb) ConsumableCpus(1) | Resources per task |
| Additional notes | |
| LoadLeveler now requires you to specify consumable resources via the resources directive. You must specify both how much memory each task uses (ConsumableMemory) and how many CPUs each task uses (ConsumableCpus). In general, you will want ConsumableCpus(1), instead increasing the number of tasks based on your code's scalability: 8 tasks for an 8-way job, 32 tasks for a 32-way job, and so on. As an example, if you request 1 node with 32 tasks and 32 ConsumableCpus, then you are requesting 1024 total processors and 32 times the amount of RAM. Pandora will not be able to provide this. The network directive can either be network.MPI_LAPI or network.MPI, except when you are running GAMESS, it must be network.MPI_LAPI. | |
6.2. Submit and monitor jobs
To submit a job, use llsubmit
$llsubmit myscript.ll
To monitor a job use llq, -s option for additional details
-bash-3.2$ llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- pandora1.11277.0 aixuser 7/1 13:45 R 50 workq pandora002 -bash-3.2$ -bash-3.2$ llq -s pandora1.11277.0