Software developed by Livermore Computing manages and schedules complex supercomputing workflows for breakthrough science
The Flux Supercomputing Workload Manager: Improving on Innovation and Planning for the Future
The Flux development team, from top left: Thomas Scogland, Albert Chu, Tapasya Patki, Stephen Herbein, Mark Grondona, Becky Springmeyer, Christopher Moussa, Jim Garlick, Daniel Milroy, Clay England, Michela Taufer (Academic Co-PI), Ryan Day, Dong H. Ahn (PI), Barry Rountree, Zeke Morton, Jae-Seung Yeom, James Corbett. Credit: Lawrence Livermore National Laboratory (LLNL)
Making scientific breakthroughs with the help of supercomputers and new data-science approaches often requires the use of complex scientific workflows. But the combination of workflows and hardware innovations is increasingly rendering traditional computing resource management and scheduling software incapable of providing the necessary support.
So computer scientists at LLNL devised an open-source software framework called Flux that manages and schedules computing workflows to use system resources more efficiently and provide results faster.
The impact of Flux has already been realized in COVID-19, cancer, and advanced manufacturing research projects. That mix of early successes and prospects for future use likely contributed to Flux winning a 2021 R&D 100 Award, according to Flux team principal investigator Dong Ahn.
The PathForward element of Exascale Computing Project (ECP) was established to prepare the US industry for exascale system procurements and generally improve US competitiveness in the worldwide computing market.
A competitive PathForward RFP (Request for Proposals) was released in 2016, seeking responses to improve application performance and developer productivity while maximizing the energy efficiency and reliability of an exascale system. Following a response review process, six responses were selected for award, and contract negotiations began.
All six selected responses successfully led to contracts that were awarded and announced in June 2017.
The quantity and breadth of the work and milestones across six contracts presented a challenge for ECP to ensure the work was properly reviewed and feedback provided to the contract awardees in a timely fashion.
The PathForward program engaged working groups for each contract, with members from each of the six core ECP DOE national laboratories tasked with ensuring the reviews were performed effectively by subject matter experts from across those institutions.
ECP has issued a report that serves as a companion to the deliverable for the ECP milestone PM-HI-1040, Assess PathForward Impact Against Exascale Hardware Challenges.
The report summarizes the final status of each PathForward project, describes progress achieved against PathForward contract milestones, and includes a final assessment of each vendor’s progress on key exascale challenges.
This companion report details the results of the PathForward research to the extent possible without disclosing proprietary information and the impact of products and US exascale systems.
Additionally, it captures lessons learned to inform future projects in general and in high-performance computing in particular.
ECP's 2021 Application Development Milestone Report Summarizes Subproject Status
A milestone report that summarizes the status of all 30 ECP Application Development subprojects at the end of FY20 may be obtained through ECP's website.
The report contains not only an accurate snapshot of each subproject’s status but also represents an unprecedentedly broad account of experiences in porting large scientific applications to next-generation high-performance computing architectures.
ECP Brings a Host of Hardware-Accelerated and GPU-Friendly Optimizations to the MPICH Library
Message Passing Interface (MPI) has been the communications backbone for distributed high-performance computing (HPC) scientific applications since its introduction in the 1990s. MPI is still essential to scientific computing because modern HPC workloads are too demanding for any single computational node to handle alone.
Instead, scientific applications are distributed across many computational nodes. It is this combined distributed computational throughput—enabled by MPI—that provides the petascale and soon-to-be-exascale performance of the world’s largest supercomputers.
The MPICH library is one of the most popular implementations of MPI. Primarily developed at Argonne National Laboratory (ANL) with contributions from external collaborators, MPICH has adhered to the idea of delivering a high-performance MPI library by working closely with vendors in which the MPICH software provides the link between the MPI interface used by applications programmers and vendors who provide low-level hardware acceleration for their network devices.
Yanfei Guo, the principal investigator of the Exascale MPI project in the Exascale Computing Project (ECP) and assistant computer scientist at ANL, is following this tradition. According to Guo, “The ECP MPICH team is working closely with vendors to add general optimizations—optimizations that will work in all situations—to speed MPICH and leverage the capabilities of accelerators, such as GPUs.”
ECP @ SIAM PP22 web page
The SIAM Conference on Parallel Processing for Scientific Computing (SIAM-PP22), held virtually February 23–26, showcased a broad range of work by the ECP community. A page published on the ECP website outlines the specific instances of participation.
The ECP website offers a Library section with access to pages containing a searchable list of journal and conference proceedings publications of ECP-funded research, technical highlight articles, links to technical reports, and the Let’s Talk Exascale podcast episode list.
Novel simulation framework answers additive manufacturing research questions for the exascale era
Tusas, a novel open-source phase-field simulation framework, provides a robust route for simulating materials microstructures that will more accurately complement related experimental studies than existing mesoscale tools. The work is published in the September 2021 issue of the Journal of Computational Physics.
ECP provides a robust developer training and productivity program. The objective is to keep application and software team members, staff, and other stakeholders abreast of emerging technologies. This effort is a close collaboration of US Department of Energy facilities, vendors, and the ECP community.
If you received this email because one of your very thoughtful colleagues passed it along to you, you can sign up for your own subscription to ECP updates!