LSU HPC Allocations Policy
▶ Table of Contents
- System Allocations
- PI Qualification
- Allocation Categories and Process
- Machine Access Policy
- HPC Resources Allocation Committee Members
1. System Allocations
LSU (Louisiana State University) maintains several high performance computing (HPC) systems. The machines currently include an IBM Power7-755 constellation, and several Intel Xeon-based x86 clusters. Detailed information on the various systems can be found on the HPC web site. LSU controls access to these resources via a formal user account and resource allocation processes. All decisions in these matters are made by the LSU HPC Resource Allocation Committee (HPCRAC), which is made up of faculty representatives from different discplines across LSU.
A 2-step process is required to gain access to LSU HPC resources, be they computational time or data storage. The first step requires applying for a system user, or login, account. The second step requires associating the user account with an allocation. An allocation provides the means for assigning, tracking, and controlling resource consumption. While the process is similar for both computational time and storage space, there are separate policy documents for each. This particular document will focus on allocations for computational time.
An allocation of computational time is analogous to a bank account. It provides processor time that a user may expend as they see fit on one or more systems. When that time is consumed, no more work can be done until a new allocation is identified. Individuals who are awarded an allocation (called the principle investigator, or PI) have the ability to add users to their allocation account thus allowing multiple people to use one allocation. The PI consequently retains ultimate responsibility for who uses their allocation and how it is expended.
Computational allocations are awarded in units called service units (SU), where 1 SU corresponds to running a program for 1 wall clock hour on 1 processing core. How cores are associated with a program varies from system to system. All allocations are awarded for 1 year periods, and are considered active until they expire, even if the SU balance has gone to zero. As the policy will outline, there are limits on both the total allocation amounts, and number of active allocations a PI may hold at any given point in time.Back to Top
2. PI Qualification
For the purpose of computational allocations, only active LSU faculty members and permanent research staff (subject to HPCRAC Chair review and approval) are qualified to serve as a PI. Adjunct and Visiting professors do not qualify. Requesting an allocation involves submission of a project proposal describing the intended usage and providing justification for the resource amounts requested. Actual submission is controlled by web pages on the HPC web site, and varies in complexity with the size of the request.
Once an allocation is awarded, the PI is granted access to tools which allow tracking of SU consumption, as well as the management of additional users who may use it. Associated with each allocation is an account code which is used by the system to determine how to charge for resources consumed. The PI is ultimately responsible for all users they authorize on an allocation, but has the option of formally delegating some other user on the allocation to manage this process in their stead. It becomes the joint responsibility of the PI and the authorized users to make sure the allocation is expended properly, and for the purposes intended.
LSU welcomes members of other institutions who are collaborating with LSU researchers to use LSU resources, but they cannot be the PI requesting standard allocations (Default, Startup or Research) or granting access. They must ask a qualified PI at LSU to sponsor them for a user account and add them to an appropriate existing allocation.Back to Top
3. Allocation Categories and Process
At the present time, allocations are machine-specific awards, allowing user only on the platform they are awarded for. HPC resources are partitioned by category, and requests go through the HPCRAC (HPC Resource Allocation Committee). The HPCRAC represents several distinct approval authorities: the Vice Chancellor for Research and Economic Development (VCRED), the Center for Computation & Technology (CCT) Director, the HPCRAC Chair, and the HPCRAC as a panel. This reflects the sponsorship and many intended uses for LSU HPC resources. The resources intended for each use are collected into 5 categories as shown in Table 1.
|Allocation Category||Available Resources||Allocation Authority|
|Economic Development||10%||Vice-Chancellor for Research and Economic Development|
|Discretionary||10%||Center for Computation and Technology Director|
|Default Allocation (2,000 SU's)||5%||HPC@LSU Staff|
|Startup Allocation (2,000-50,000 SU's)||15%||HPCRAC Chair|
|Research Allocation (> 50,000 SU's)||60%||HPCRAC|
To facilitate management of proposals, they will be assigned to one of the following classes (Table 2).
|Economic||A proposal to assist commercial interests with the adoption of high performance computing as part of their business process. Awarded by the VCRED from the Economic Development resources. Economic allocations may be renewed by the VCRED.|
|Discretionary||A proposal determined to be of value, but lying outside the normal allocation process. Awarded by the CCT Director from the Discretionary resources. Discretionary allocations may be renewed by the director.|
|Default||Every user account, upon creation, receives a default allocation. This allows sufficient time for conventional processing on the systems, and developing information for potential formal proposals. These allocations are limited to 2,000 SU's. Default allocations are not renewable.|
|Startup||A proposal requesting an allocation for the purpose of exploring the value of high performance computing for a new project. A PI is allowed to have a maximum of 2 startup allocations active at any given time. Startup allocation of 2,000 to 50,000 SU's are awarded by the HPCRAC Chair.|
|Research||A request for a large amount of time for a significant research project. May be awarded by simple majority agreement of the HPCRAC. Research requests are limited to 3 million SU's, and a PI may have a total of 5 million SU's active at any given time. Research allocations may be eligible for renewal by the HPCRAC.|
All allocations have a duration of one year. Default and Startup allocations may be awarded at any time during the year, and Default allocations, in particular, are awarded on creation of a user account. Research allocations are granted at the beginning of every calendar quarter. Application deadlines for Research allocations are one month prior to the start date: January 1, April 1, July 1, and October 1. A PI may request, or be requested, to make a formal presentation to the HPCRAC in conjunction with the proposal submission. At the end of any allocation, a short summary report is required by the HPCRAC. Missing reports may delay proposal processing. Any request for an allocation renewal must include a summary report. The summary should indicate if the goals of the proposed work were accomplished, any major results, and list any publications and presentations that result.
Allocations can not be extended. They are based on estimated resources available during the award period, and those resources effectively dissipate as time passes, whether used or not. The PI is ultimately responsible for assuring allocations are used at a rate needed to support their project over the timeframe of the allocation.Back to Top
3.1. Standard Allocation Limits, Requests and Renewals
3.1.1. Default Allocations (2,000 SU's)A default allocation of 2,000 SU's is awarded by HPC staff to every new user account. The computationally-driven activities of any individual whose HPC resource needs exceed 2,000 SU's in a given year may be subject to a review by the HPCRAC Chair. If more time is required, one of the other allocation awards described below must be considered.
3.1.2. Startup Allocations (<50,000 SU's)
Who may apply:Any qualified PI may request a Startup allocation for up to 50,000 SU's. Consideration must be given to the limitation on the number of simultaneous Startup allocations a PI may have active at any one time, as specified in Table 2. The intent is to support low intensity projects, such as small analysis efforts or course work.
How to apply: An application may be submitted using the HPC web interface. Completing the web form alone is sufficient. The text provided should briefly explain the computation time on the selected HPC platform, the intended methodology, and the current state of the codes or application that will be used.
Review and Awards: Startup allocations may be made at any time throughout the year, but their start time is set to the beginning of the allocation quarter they are made in.
Renewals Upon request, startup allocations may be renewed annually; renewal applications must be accompanied by a summary report including a list of publications and presentations in which the use of LSU HPC resources has been acknowledged.Back to Top
3.1.2. Research Allocations (>50,000 SU's)
Who may apply:Any qualified PI may request a Research allocation for more than 50,000 SU's, up to the maximum defined in Table 2. Research allocations can be for their own use, or on behalf of a research or operational group in support of a well-defined compute-intensive project. Requests must take into account any limitation given in Table 2 on size and the total SU's a PI may have active at any one time.
How to apply: A request for an allocation requires that a formal proposal be submitted via the HPC web interface to the HPCRAC. In addition to the web form information, the formal proposal must be in PDF format, and should be 5 pages or less. The emphasis must be on justifying the computational resources requested. The follow outline should be followed:
- Problem Statement – Section limited to 1 page, or less, describing the desired outcomes of the project.
- Background – Section limited to 1 page, or less, describing how the resources will be used to address the problem (i.e. student access for course work, specific models, etc).
- Methodology – Section limited to 1 page, or less, describing the computational methodology that will be used. This should include the applications required.
- Research Plan – Section limited to 1 page, or less, describing the research schedule, including the anticipated expenditure of granted resources. Allocations are assumed to be uniformly consumed over their lifetime. If this will not be the case, an estimate of expenditure by quarters is required.
- Requirements Analysis - Section limited to 2 pages, or less, detailing the basis for the requested computer time. Large allocations must exhibit an understanding of application efficiency, scaling, and provide accurate estimations of the time requirements.
Please note that the maximum proposal page limit is 5, not including summary reports in the Attachments. The page limit was chosen with an eye to making it relatively easy to compose an allocation request, and to modify and reuse a successful application made to another center. Additional information may be attached as addendums, such as copies of awarded grants which will be supported by the requested resources. The PDF file is limited to a maximum of 2 MB, and will be rejected if the file is over this limit.
Review and Awards: Applications for a Research allocation will undergo competitive peer review and will be allocated quarterly by the HPCRAC. Requests will be granted – in whole or in part – based on the availability of HPC resources and based upon the description of the proposed activities and the appropriate use of technology, with priority given to funded projects. Allocation decisions made by the HPCRAC are deemed final. Unresolved appeals will be directed to the Vice Chancellor for Research & Economic Development (VCRED).
Renewals: Renewal allocations follow the same process as all other allocations. Proposal writers should be aware that both past usage history and submission of progress reports will be considered in the award determination. Applications for allocation renewals should ideally cite peer reviewed publications that acknowledge LSU HPC resources and only require an updated version of a previously successful application.Back to Top
3.2. Director's Discretionary Allocation Request (submit to CCT Director)
Each calendar year, up to 10% of the available HPC resources will be allocated at the discretion of the CCT Director and the Chief Information Officer (CIO). Allocations made from this pool will be for a period not to exceed one year. Awardees will be expected to work in conjunction with CCT or HPC@LSU technical staff to ensure that project implementations are in line with and strengthen the CCT's broad interdisciplinary mission. At the termination of a project (or annually, if an award is extended beyond one year) awardees shall provide a summary report of project activities along with a list of publications and presentations in which the use of LSU HPC resources has been acknowledged.Back to Top
3.3. Economic Development or 'On Demand' Allocation Request (submit to VCRED)
The VCRED, in consultation with the chair of the HPCRAC and the CCT Director, may grant Economic Development or On Demand access to HPC resources for high priority projects like render farm for economic development purposes, hurricane storm surge modeling, or other possible operational responsibilities that are deemed sufficiently different from standard research proposals. Each calendar year, up to 10% of the available HPC resources may be allocated by the VCRED. At the termination of a project (or annually, if an award is extended beyond one year) awardees shall provide a summary report of project activities along with a list of publications and presentations in which the use of LSU HPC resources has been acknowledged.Back to Top
3.4. Allocation Management
An allocation is considered valid so long as a positive resource balance remains, and the expiration date has not been exceeded. Once an allocation expires, or has been fully consumed, users accounts will be blocked from submitting work against the allocation. There is no mechanism for extending an allocation beyond one year, nor for adding resources once an allocation has been expended.
User accounts must be associated with a valid allocation, and if not, will be retained for a maximum of 1 year pending authorization against a renewed or different allocation. With these restrictions in mind, the PI is required to use the tools provided to monitor system usage and control authorization of project member accounts. PIs are strongly advised to carefully budget their usage appropriately throughout the year. Automatic reminder emails will be sent by the management system as an allocation nears expiration. PIs are ultimately responsible for assuring that a current and actively monitored management email address has been assigned to each allocation.
At the end of any allocation, a short summary report must be submitted to the allocation committee. Failure to submit this report may be used in the consideration of future allocations. The report may simply reference any formal publications and presentations that resulted from using HPC resources, or provide a high level overview of what was accomplished.Back to Top
3.5. Early Allocation Access
If a PI who has already been awarded a Research allocation by HPCRAC puts in a new request, and there is a good reason to start the project before the next cycle, then the HPCRAC Chair can instruct the HPC@LSU staff to award as much as 25% of the project request immediately. Justification must be provided in the renewal proposal. The HPCRAC committee would be made aware of this action and its reasons by the HPCRAC Chair using the "LSU HPC Allocations" email list.Back to Top
4. Machine Access Policy
4.1. Job Queueing
Various workload balancing algorithms are used to determine how jobs are assigned resources on a given machine. The way a job is handled is determined by the job queue it is submitted to. Efficient use of the queuing system requires that users request runtimes consistent with estimated runtimes of their jobs. In particular, requesting more time than is necessary for a particular job can lead to inefficient and unfair queuing. Therefore, users that routinely request more time than is needed for their jobs are subject to a priority penalty that will lower the priority of their jobs. Each system sets a maximum number of jobs that a single user may have running without special permission (see below). There is no limit to the number of jobs that a particular user may have queued. Users that wish to obtain a higher priority for their jobs may use special priority queues (see below).
The available processors are currently divided into 2 architecture specific groups: IBM P7-755 systems running AIX, and Intel x86 systems running Linux. The processors in each group are further subdivided into preemptory and dedicated pools. Certain mission critical applications, such as storm surge prediction during a hurricane threat, are granted immediate access to processors in the preemptory pool. Processors in the dedicated pool are used to run all other job types. The processors are accessed through different job queues. There are 5 job queues which use different combinations of the processor pools, and allow for different job characteristics.
- Preempt Queue: The preempt queue controls access to the preemptory pool. Authorized applications submitted to this queue will cause the termination of all other user applications running on preemptory nodes.
- Checkpt Queue: The checkpt queue controls access to nodes in both the dedicated and preemptory pool. Jobs running in this queue may be subject to termination by the preempt queue, thus are implicitly assumed to support restarts based on periodically saved information. No refunds of lost SU's are offered if jobs in the checkpt queue are terminated by preemption. The user running jobs without restart capability assume this risk. However, the benefits from using this queue include access to larger numbers of nodes, and/or faster throughput, depending on how busy the queue is.
- Workq: The workq queue controls access to nodes in the dedicated pool. Jobs in the work queue will run until they terminate as planned, their requested run time has expired, or they stop due to an abnormal system failure. Jobs which are terminated due to system errors beyond the user's control may be subject to refund of expended SU's. Poor planning is not considered grounds for a refund.
- Interactive Queue: The interactive queue gives real-time access to jobs for on-line analysis or debugging, but only allows very short run times. It supports development work, but not production.
- Priority Queue: The priority queue controls nodes in the dedicated pool, but allows applications to be given higher priority with prior approval. Approval may be granted during training sessions, for demonstration purposes, or other special needs. The SU's charged will be adjusted by a factor of 1.3, and no more than 20% of an allocation may be expended in the queue. Requests for priority access should be directed to email@example.com. This queue does not impact other running user jobs, but will delay the start of lower priority jobs already in the queue.
Note: The named queues do not necessarily exist on all machines, and the maximum time allowed in the queues will vary from machine to machine.Back to Top
Currently disk space usage is controlled via user quotas rather than on a per-project basis. Storage may become an allocated resource in the future. At such time, a request for storage space will be required in the allocation request. At the current time, an estimate of required space is requested.Back to Top
4.3. Special Requests
A request for special access to LSU HPC machines (such as usage of all nodes on a machine or exceptionally long runs) must be explicitly stated in the proposal for HPC resources. Appeals to the decision of the HPCRAC may be made to the VCRED.Back to Top
5. HPC Resources Allocation Committee Members
The proposed HPCRAC members are shown in Table 3. The CCT Director, in consultation with the VERED, appoints the Chair of HPCRAC. The HPCRAC is chartered approval authority allocating HPC resources, and charged with the task of adopting the HPC Resources Allocations Policy and reporting its activities and recommendations to VCRED.
|Honggao Liu (Chair)||CCTfirstname.lastname@example.org|
|Jim Q. Chen||Civil & Environmental Engineeringemail@example.com|
|Shawn W. Walker||Mathematicsfirstname.lastname@example.org|
|Krishnaswamy Nandakumar||Chemical Engineeringemail@example.com|
|Jeremy Brown||Biological Sciencesfirstname.lastname@example.org|
Summary report: This report should be a maximum of 5 pages and should include the following information:
- Principal user information [PI] (name, status, department, phone, email, institution [if different from LSU]
- Summarize the nature of the LSU-sponsored research in a non-technical fashion, suitable for public consumption. (Approximately one page; no more than two pages)
- Describe any potential applications of the research to industry or government.
- Describe any use of this allocation to encourage education in computational science, and particularly the level of student involvement and any HPC elements incorporated into formal courses.
Definition of Service Unit (SU): Currently, one SU corresponds to one hour of wall-clock time on one processing core. A single machine will have multiple nodes (individual servers) available, and multiple cores within each node. This allows many cores to be used for a single parallel processing job, but can lead to not-so-obvious charges. For example, running for 1 hour using 8 cores on an 8-core node consumes 8 SU's. On the other hand, running for 1 hour on 1 core of the same 8 core node, which allows the single core to access all of the node memory, also consumes 8 SU's. In simple terms, the number of cores that are reserved for a job, and hence are unavailable to others, is the number used to calculate SU usage.
HPC resources at LSU will be allocated/charged according to the number of “service units (SU's)” required/used, where:
# SU's = m * #Nodes * Wall_Time,
m = number of processing cores per Node;
Wall_Time = Total Wall Clock Hours.
For example, if a machine has 4 processing cores per Node (m = 4), a program that ran for 24 hours on 32 nodes required 4*32*24 = 3072 SU's.Back to Top
Last revised: 5 September 2014