LONI User Allocation and Account Policy
Last revised: 21 Aug 2013.
▶ Table of Contents
- Accounts and Allocations
- The Allocation Process
- Machine Access Policy
- Resource Allocation Committee Members
1. Accounts and Allocations
LONI (Louisiana Optical Network Initiative) maintains several high performance computing (HPC) systems at its member sites, interconnected by a high speed optical network. The machines currently include IBM Power5-575 systems and Intel Xeon-based clusters. Detailed information can be found on the LONI web site. LONI controls access to these resources via a formal user allocation and account creation process. All decisions in these matters are made by committee made up of representives from each of the LONI member organizations.
Gaining access to LONI resources, be it CPU time or data storage, involves a two step process. The first step requires the submission of a proposal outlining the resource requirements. Approval of the request results in the award of an allocation. Allocations are awarded on a by-project basis to the principle investigator (PI) who submitted the proposal. Full-time faculty and research staff at LONI Member and Associate institutions are eligible to serve as a PI. The LONI Management Council reserves the right to designate others for PI eligibility. Allocations for CPU time are blocks of Service Units (SU) awarded for use on a specific machine, while storage allocations are awarded in terms of Gigabytes (1 billion bytes) of disk space. Currently, only extended storage on the /project file space requires an allocation.
The second step involves the authorization of system user accounts for the expenditure of allocated resources. In order to run a program, a user must be able to charge expenditure to an allocation. To accomplish this, tools are made available to the PI to control authorization of users. The PI associates an email address with an allocation to which all user account requests are sent for authorization. The PI is responsible for all authorizations, but may formally delegate someone to manage this process. Once a user account is authorized to use an allocation, the user is allowed to submit jobs or otherwise expend the allocation. This implies that a user must be authorized for at least one non-expired allocation before the systems can be used. Likewise, a user account may be authorized to use multiple allocations. It becomes the responsibility of the PI and user to make sure the appropriate allocation is charged for work.
Currently, one SU corresponds to one hour of wall-clock time on one processing core. A single machine will have multiple nodes (individual servers) available, and multiple cores within each node. This allows many cores to be used for a single parallel processing job, but can lead to not-so-obvious charges. For example, running for 1 hour using 8 cores on an 8-core node consumes 8 SU's. On the other hand, running for 1 hour on 1 core of the same 8 core node, which allows the single core to access all of the node memory, also consumes 8 SU's. In simple terms, the number of cores that are reserved for a job, and hence are unavailable to others, is the number used to calculate SU usage.Back to Top
2. The Allocation Process
LONI resources are partitioned by category, and all allocation proposals go through the LONI Resource Allocation Committee (LRAC). The LRAC represents three distinct approval authorities: individual member institutions; the LONI Management Council (LMC), represented by the LONI director; and the LRAC as a panel. Table 1 shows the distribution of LONI resources by category:
|Category||Available Resources||Allocation Authority|
|Louisiana Non-Member LONI||5%||LRAC|
|Small Allocation (< 50,000 SU’s)||30%||By member, 5% per institution.|
|Large Allocation (> 50,000 SU’s)||45%||LRAC|
Allocations are granted at the beginning of every calendar quarter and have a duration of one year. Application deadlines are one month prior to the start date, as summarized in Table 2.
|January 1||December 1 (prior year)|
|April 1||March 1|
|July 1||June 1|
|October 1||September 1|
A PI may request, or be requested, to make a formal presentation to the allocation authority in conjunction with their proposal submission. At the end of any allocation, a summary report must be submitted to the authorization authority. Any request for allocation renewal must include a summary report, including renewals for startup allocations. The summary should indicate if the goals of the proposed work were accomplished, any major results, and a list of any publications that result.
To facilitate management of proposals, they are identified as belonging to one of several classes (Table 3).
|Economic||A proposal to assist commercial interests with the adoption of high performance computing as part of their business process. Awarded by the LMC from the Economic Development resources.|
|Discretionary||A proposal determined to be of value, but lying outside the normal allocation process. Awarded by the LMC from the Discretionary resources.|
|Small||A proposal requesting a Small allocation. May be awarded from LRAC or institution resources.|
|Startup||A proposal requesting an allocation for the purpose of exploring the value of high performance computing for a new project. This may be awarded in any category, within a member's 5% resource pool. Only 2 startup allocations may be active at any time per PI. Serial application for Startup Allocations is discouraged as the intent is to use them to develop proposal material for one of the other allocation types.|
|Large||A proposal requesting a Large allocation. May be awarded by the LRAC against Large or Non-Member resources. Large requests are limited to 4M SU, and a PI may have a total of 6M SU active at any given time. Only faculty members of LONI member and Louisiana associate institutions may serve as the PI for large allocations.|
2.1 Proposal Requirements
A request for an allocation requires that a formal proposal be submitted via the LONI web interface to the LRAC. The proposal is expected to be 5 pages or less, and concentrate on justifying the computational resources requested. The following outline should be followed:
- Problem Statement - section limited to 1 page, or less, describing the desired outcomes of the project.
- Background - section limited to 1 page or less, describing how the resources will be used to address the problem (i.e. student access for course work, specific models, etc.).
- Methodology - section limited to 1 page, or less, describing the computational methodology that will be used. This should include the applications required.
- Research Plan - section limited to 1 page, or less, describing the research schedule, including the anticipated expenditure of granted resources. Allocations are assumed to be uniformly consumed over their lifetime. If this will not be the case, an estimate of expenditure by calendar quarter is required.
- Requirements Analysis - section limited to 2 pages, or less, detailing the basis for the requested computer time. Requests for large allocations must exhibit an understanding of application efficiency, scaling, and provide accurate estimations of the SU requirements.
- Attachments - must include summary reports from previous allocations.
Please note that the maximum proposal page limit is 5, not including summary reports in the Attachments. The page limit was chosen with an eye to making it relatively easy to compose an allocation request, or to modify and reuse a successful application made to another center. Additional information may be attached as addendums, such as copies of awarded grants which will be supported by the requested resources.
Project allocations are competitively reviewed and granted based upon the description of the proposed research and the use of technology. For Large Allocations, priority is given to funded research. All decisions made by the LRAC are deemed final. Appeals can be directed to the LMC or the member institution council representative.
Renewal allocations follow the same process as all other allocations. Proposal writers should be aware that both past usage history and submission of progress reports will be considered in the award determination. Applications for allocation renewals should ideally cite peer reviewed publications that acknowledge LONI resources and only require an updated version of a previously successful application.Back to Top
2.2 Allocation Management
An allocation is considered valid so long as a positive resource balance remains, and the expiration date has not been exceeded. Once an allocation expires, or has been fully consumed, users accounts will be blocked from submitting work against the allocation. There is no mechanism for extending an allocation beyond one year, nor for adding resources once an allocation has been expended.
User accounts must be associated with a valid allocation, and if not, will be retained for a maximum of 1 year pending authorization against a renewed or different allocation. With these restrictions in mind, the PI is required to use the tools provided to monitor system usage and control authorization of project member accounts. PI's are strongly advised to carefully budget their usage appropriately throughout the year. Automatic reminder emails will be sent by the management system as an allocation nears expiration. PI’s are ultimately responsible for assuring that a current and actively monitored management email address has been assigned to each allocation.
At the end of any allocation, a short summary report must be submitted to the allocation committee. Failure to submit this report may be used in the consideration of future allocations. The report may simply reference any formal publications that resulted from using LONI resources, or provide a highlevel overview of what was accomplished.Back to Top
2.3 Early Allocation Access
If a PI who has already been awarded a large allocation by LRAC puts in a new request, and there is a good reason to start the project before the next cycle, then the local representative can tell staff to award as much as 25% of the project request. Justification must be provided in the renewal proposal. The LRAC committee would be made aware of this action and its reasons by the local representative using "LONI Allocations" listerver.Back to Top
3. Machine Access Policy
3.1 Job Queueing
Various workload balancing algorithms are used to determine how jobs are assigned resources on a given machine. The way a job is handled is determined by the job queue it is submitted to. Efficient use of the queing system requires that users request runtimes consistent with estimated runtimes of their jobs. In particular, requesting more time than is necessary for a particular job can lead to inefficient and unfair queuing. Therefore, users that routinely request more time than is needed for their jobs are subject to a “priority penalty” that will lower the priority of their jobs. Each system sets a maximum number of jobs that a single user may have running without special permission (see below). There is no limit to the number of jobs that are particular use may have queued. Users that wish to obtain a higher priority for their jobs may use special priority queues (see below).Back to Top
The available processors are currently divided into 2 architecture specific groups: IBM P5-575 systems running AIX, and Intel x86 sustems running Linux. The processors in each group are further subdivided into preemptory and dedicated pools. Certain mission critical applications, such as storm surge prediction during a hurricane threat, are granted immediate access to processors in the preemptory pool. Processors in the dedicated pool are used to run all other job types. The processors are accessed through different job queues.
There are 5 job queues which use different combinations of the processor pools, and allow for different job characteristics.
3.2.1. Preempt Queue
The preempt queue controls access to the preemptory pool. Authorized applications submitted to this queue will cause the termination of all other user applications running on preemptory nodes.
3.2.2. Checkpt Queue
The checkpt queue controls access to nodes in both the dedicated and preemptory pool. Jobs running in this queue may be subject to termination by the preempt queue, thus are implicitly assumed to support restarts based on periodically saved information. No refunds of lost SU's are offered if jobs in the checkpt queue are terminated by preemption. The user running jobs without restart capability assume this risk. However, the benefits from using this queue include access to larger numbers of nodes, and/or faster throughput, depending on how busy the queue is.
3.2.3. Workq Queue
The workq queue controls access to nodes in the dedicated pool. Jobs in the work queue will run until they terminate as planned, their requested run time has expired, or they stop due to an abnormal system failure. Jobs which are terminated due to system errors beyond the user's control may be subject to refund of expended SU's. Poor planning is not considered grounds for a refund.
3.2.4. Interactive Queue
The interactive queue gives real-time access to jobs for on-line analysis or debugging, but only allows very short run times. It supports development work, but not production.
3.2.5. Priority Queue
The priority queue controls nodes in the dedicated pool, but allows applications to be given higher priority with prior approval. Approval may be granted during training sessions, for demonstration purposes, or other special needs. The SUs charged will be adjusted by a factor of 1.3, and no more than 20% of an allocation may be expended in the queue. Requests for priority access should be directed to email@example.com. This queue does not impact other running user jobs, but will delay the start of lower priority jobs already in the queue.
3.2.6. Queue Availability
The named queues do not necessarily exist on all machines, and the maximum time allowed in the queues will vary from machine to machine.Back to Top
Currently disk space usage is controlled via user quotas rather than on a per-project basis. Storage may become an allocated resource in the future. At such time, a request for storage space will be required in the allocation request. At the current time, an estimate of required space is requested.Back to Top
3.4. Special Requests
A request for special access to LONI machines (such as usage of all nodes on a machine or exceptionally long runs) must be explicitly stated in the proposal for LONI resources. Appeals to the decision of the LRAC may be made to the LONI Management Council.Back to Top
4. Resource Allocation Committee Members
The current LRAC members shown in Table 4.
Each member holds approval authority for their respective institution. The LONI Director is also a member, and holds approval authority for the LONI Management Council.Back to Top