Special HPC Event: Accelerating Production with Xeon Phi Coprocessors on LSU SuperMIC Cluster
Introduction
- Do you want to produce your scientific results in a shorter wall-clock time on supercomputers?
- Do you want to obtain more results per unit time to increase your research productivity?
- Do you want to speed up your research progress?
- Do you want to prepare yourself for the next generation supercomputing and learn about parallel computing with coprocessors?
The LSU HPC User Service team will hold a special event entitled Accelerating Production at 9am Oct 14th, 2015 in the Frey 307 classroom. The purpose of this event is twofold: to raise the computational research community’s awareness of the benefit of leveraging Intel Xeon Phi coprocessors in their research and to engender collaboration between research groups and HPC user consultants by working together on porting their codes to coprocessors. All interested students, professors and researchers at LSU are invited to this special event.
SuperMIC is LSU's fastest supercomputer, ranking 116th on the latest Top 500 list. There are 360 compute nodes on SuperMIC. Each node has two Xeon Phi coprocessors, which provide much more processing power compared to traditional CPU's. While some researchers are already running Xeon-Phi-accelerated programs or are porting their programs to Xeon Phi, we found that the majority of LSU HPC users are not tapping into all available processing power on SuperMIC. This is an opportunity for LSU researchers to experience how Xeon Phi's can accelerate their computer programs and speed up their research progress. We feel that this is important for everyone in the computational research community not only because currently ongoing research projects could be expedited by using Xeon Phi's, but also because it represents the direction where the next generation supercomputing is heading. Besides, porting traditional parallel programs to Xeon Phi is not as difficult as you might think. Only minor coding is required for typical cases. Many applications can run on the Xeon Phi directly without any recoding.
In this event, we do not intend to present technical details on how to write and optimize programs for the Xeon Phi architecture, which will be covered in our regular weekly training. Instead, we will first walk you through some successful examples in a variety of disciplines to show the promising achievement to accelerate applications. For the rest of the session, we will have one-to-one discussions with you on how we can help you to accelerate your application . We strongly encourage you to bring the programs you run day-to-day on HPC clusters to this event so that we can work together to speed up your program on Xeon Phi.
Schedule
Time | Topic |
---|---|
9:00 - 9:30 | Introduction |
9:30 - 11:00 | One-to-one discussions on how to accelerate user applications on SuperMIC using Xeon Phi |
Who can benefit from using Xeon Phi?
In the following, we identify a few scenarios where you as a user can probably benefit immediately by starting to use the Xeon Phi coprocessors, along with examples of them on SuperMIC.
Scenario 1: you are running a third-party code that has been optimized for Xeon Phi
Many widely-used applications have already been optimized for Xeon Phi by their developers. Please refer to a full list provided by Intel at here. The applications on this catalog can run directly on Xeon Phi without any recoding. A sample list of Xeon-Phi enabled applications is shown below.
Molecular dynamics: LAMMPS, NAMD, GROMACS, AMBER Computational chemistry: NWChem Material science: Quantum ESPRESSO Physics: Chroma QCD, QphiX QCD Computational Fluid Dynamics: OpenLB, LBS3D Finance: BlackScholes SP and DP, Monte Carlo SP and DP
Example on SuperMIC: NAMD
NAMD is a popular molecular dynamics application, widely used for high-performance simulation of large biomolecular systems. On SuperMIC, a speedup between 1.x and 2.x was achieved with two benchmark systems.
|
|
Example on SuperMIC: LAMMPS
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. On SuperMIC, a speedup around 2 was achieved with the Rhodopsin benchmark example and a speedup between 3.5-6 was achieved with the Liquid Crystal benchmark example.
|
|
Scenario 2: you are running a code written by yourself
If you do not run one of the Xeon-Phi enabled applications mentioned above, but a custom code written by yourself or your group members, it is still very likely that it can be accelerated on Xeon Phi. First of all, if the Intel MKL library is applied in your codes, it is straightforward to auto-offload MKL portions to Xeon Phi by adding compiling flags. If OpenMP or hybrid MPI-OpenMP parallel schemes are utilized in your codes, the OpenMP blocks can be offloaded to Xeon Phi with minor modification of the source codes. If your code is written in pure MPI, it can still be accelerated by the so-called symmetric processing.
Example on SuperMIC: Helium-FEDVR
Helium-FEDVR code is a large-scale program developed by computational physicists at LSU and used to study intense laser interaction with the helium atom. The figures below show the thread speedup on the Phi and performance comparison between the Xeon and Phi. For instance, using one node the Phi’s performance outperforms the Xeon by 33%. This means that the speedup of 2.33 can be achieved when the Phi is used as coprocessor sharing the workload with the Xeon at the same time.
|
|
Scenario 3: you are running a third-party code that has not been optimized for Xeon Phi
If you do not write your own codes but use open-source third-party codes instead, it may still be possible to modify the source codes to make it run on Xeon Phi and get better performance. The same techniques mentioned in scenario 2 can be applied here as well.
Example on SuperMIC: eFindSite
eFindSite, written in c++, is a structure bioinformatics algorithm for the prediction of ligand-binding sites in proteins, where the structure alignment is a third-party code written in Fortran77. The parallel version of eFindSite is implemented by offloading the third-party code to Phi and 17.7 folds performance increase is achieved compared to the serial (one thread) version.
How much can computer programs be accelerated on SuperMIC?
Here is a comparison between the LSU clusters Super Mike 2 and SuperMIC.
First, the speed of CPUs on a SuperMIC compute node (two 2.8GHz 10-core Ivy Bridge-EP Xeon 64-bit processors) are faster than those on SuperMike (two 2.6 GHz 8-core Sandy Bridge Xeon 64-bit processors). The memory size of a SuperMIC compute node (64 GB) is double that of SuperMike (32 GB). Our test results indicate some programs have up to 1.5 times faster performance on SuperMIC (using CPU only) than on Super Mike 2.
Second, programs can be accelerated on Xeon Phi coprocessors. In theory, the computation power of one Xeon Phi coprocessor is 2.7 times that of two Ivy Bridge CPU combined on SuperMIC. The real acceleration for many programs is typically between 1.5 to 2 times.
Third, when both Xeon Phi coprocessors on a compute node are used, further acceleration can be obtained. The theoretical maximum acceleration on a SuperMIC compute node using one CPU and two Xeon Phis over one CPU is 6.4 times.
Additional information on Xeon Phi benchmarks is provided by Intel at here. We provide benchmarking for NAMD and LAMPPS on SuperMIC at here.
Final remarks
Our intent is to help you make use of the Xeon-Phi coprocessors on SuperMIC and to speed up your research. Continued code-porting assistance from the HPC User Services team will be provided if needed. If you choose to utilize Xeon Phi coprocessors, you get the added bonus of spending fewer allocation Service Units (SUs) to complete a given same job. And if you document usage of Xeon Phi in your proposal application for an allocation on SuperMIC, your proposal will be more likely to be considered favorably. We believe that the effort to migrate your codes from CPU to Xeon Phi is worthy. Finally, we welcome all interested LSU researchers to this special HPC event.
Registration
Instructors
- Feng Chen, Ph.D (Civil Engineering)
- Shaohao Chen, Ph.D (Physics)
- Wei Feinstein, Ph.D (Biology)
- Xiaoxu Guan, Ph.D (Physics)
- Jim Lupo, Ph.D (Astrophysics)
- Le Yan, Ph.D (Chemical Engineering)