LSU HPC Storage Policy
▶ Table of Contents
1. Storage Systems
Available storage on LSU@HPC high performance computing machines is divided into four file systems (Table 1).
| File System | Description |
|---|---|
| Scratch | Temporary storage on each node that is available only during job execution. |
| Work | Shared storage provided to each user for input, intermediate, and output files of a job. |
| Home | Persistant storage provided to each active user account. |
| Project | Persistant storage provided by request for a limited time to hold large amounts project-specific data. |
All of the file systems share common characteristics. When they are too full, system performance begins to suffer. Similarly, placing too many individual files in a directory degrades performance. System management activities are aim at keeping the different file systems below 85% of capacity. It is strongly recommended that users hold file counts below 1,000, and never exceed 10,000, per directory. If atypical usage begins to impact performance, individual users may be contacted and asked to help resolve the issue. When performance or capacity issues become significant, system managers may intervene by requiring users to offload files, stop jobs, or take other actions to ensure the continued operation of the system.
Management makes every effort to avoid data loss. In the event of a failure, every effort will be made to recover data. However, the volume of data housed makes it impractical to provide system-wide data backup. Users are expected to safeguard their own data by making sure that all important code, scripts, documents and data are transferred to another location in a timely manner. The use of inexpensive, high capacity external hard drives attached to a workstation is highly recommended for individual user backup.
Back to Top2. File System Details
2.1. Scratch
Scratch space is available on all systems, and to all users during the course of a job, but files are subject to deletion once a job ends. The size of this file system will vary from system to system, and possibly across nodes within a system. This is the preferred place to put any intermediate files required while a job is executing. Users should not have any expectation that files will exist after a job terminates, and are expected to move the data from Scratch to their Work or Home directory as part of the clean up process in their job script.
Back to Top2.2. Work
Work is the primary file storage area to utilize when running jobs. Work is a common file system that all system nodes have access to. It is ideal for input files, checkpoint files and output. A storage quota may be enforced per user on the Work space file system, and usage allowed will vary by system, as shown in Table 3. This does not imply that each user should intend to fully consume this amount, since the quota system uses overbooking to optimize available space. This quota serves as a hard upper limit to prevent a single user from disrupting performance of the entire system. Files on Work will persist indefinitely. Users should not have any expectation of backup. The safekeeping of data is the responsibility of the user.
On systems which do not enforce a quota on Work space, the file system is fully shared. However, if capacity approaches 85%, the files are subject to purge by management. The purge process targets files with the oldest age, and continues until the capacity drops to an acceptable level. The purge process will normal occur once per month, as necessary. On systems with no Work quota, persistent storage for large amounts of data is provided on the Project file system.
Should the need arise; users may request an increase in their Work quota for a reasonable period of time. A request containing a justification for the increased storage should be sent to sys-help@loni.org. The email must include the name of the PI, a valid contact email, a summary of any existing quotas, and a justification statement. Table 2 shows the approval authority for quota increases, based on size.
| Class | Size | Approval Authority |
|---|---|---|
| Default | 50/100GB | Default by system for each user |
| Medium | Between 100GB and 1TB | HPC@LSU Operations Manager |
| Large | Over 1TB | HPC@LSU Directory |
2.3. Home
All user home directories are located in the Home file system. Home is intended for the user to store source code, executables, and scripts. Home may be considered persistent storage. The data here should remain as long as the user has a valid account on the system. Home will always have a storage quota, and is clearly a hard limit. While Home is not subject to management activities controlling capacity and performance, it should not be considered permanent storage, as system failure may result in the loss of information. Users should arrange for backing up their own data.
Back to Top2.4. Project
The Project file system may be available on some systems, and provides space specific to a project. Project space allocations must be requested, and they are available for a limited time period. Allocations are typically 6 months or less. Shortly before an allocation expires the user will be notified of the upcoming expiration. Users may request to have the allocation extended. Renewal requests should be submitted at least 1 month prior to expiration to allow decision and planning time. Users should have no expectation that data will persist, and may be erased any time after 1 month from a project’s expiration. Thus alternate safe keeping and protection actions must be taken in advance.
Back to Top3. System Specific Information
| System | File System | Storage (Tb) |
Quota (GB) |
Purge File Limit (Million) |
|---|---|---|---|---|
| IBM P5-575 Pelican |
Work | 28 (GPFS) | N/A | 5 |
| Home | (GPFS)[1] | 5 | N/A | |
| Dell x86 Cluster Tezpur |
Work | 60(LPFS)[2] | N/A | 8 |
| Home | (NFS)[3] | 5 | N/A | |
| Project | 60 | By Request |
4. Job Use
On all systems, jobs must be run from the Work file systems, and not from the Home or Project file systems. Files should be copied from Home or Project space to Work before a job is executed, and back when a job terminates, to avoid excessive I/O during execution that degrades system performance.
Back to Top5. /project Allocation Requests
Space is allocated on the Project file system by request only for periods of 6 months. A request for initial or renewal allocation may be made by sending an email to syshelp@loni.org. The email must include the name of the PI, a valid contact email, a current allocation code, summary of any existing allocation account information (CPU or storage), and a justification statement. Allocations are divided into 3 classes, each with a separate approval authority (Table 4). All storage allocation requests are limited by the available space, and justifications must be commensurate with the amount of space required.
| Class | Size | Approval Authority |
|---|---|---|
| Small | Up to 100GB | HPC@LSU Staff |
| Medium | Between 100GB and 1TB | HPC@LSU Operations Manager |
| Large | Over 1TB | HPC@LSU Director |
[1] Global Parallel File System
[2] Lustre Parallel File System
[3] Network File System
Last revised: 11 December 2009