The Flux Supercomputing Workload Manager: Improving on Innovation and Planning for the Future

The Flux development team, from top left: Thomas Scogland, Albert Chu, Tapasya Patki, Stephen Herbein, Mark Grondona, Becky Springmeyer, Christopher Moussa, Jim Garlick, Daniel Milroy, Clay England, Michela Taufer (Academic Co-PI), Ryan Day, Dong H. Ahn (PI), Barry Rountree, Zeke Morton, Jae-Seung Yeom, James Corbett. Credit: Lawrence Livermore National Laboratory (LLNL)

Making scientific breakthroughs with the help of supercomputers and new data-science approaches often requires the use of complex scientific workflows. But the combination of workflows and hardware innovations is increasingly rendering traditional computing resource management and scheduling software incapable of providing the necessary support.

So computer scientists at LLNL devised an open-source software framework called Flux that manages and schedules computing workflows to use system resources more efficiently and provide results faster.

The impact of Flux has already been realized in COVID-19, cancer, and advanced manufacturing research projects. That mix of early successes and prospects for future use likely contributed to Flux winning a 2021 R&D 100 Award, according to Flux team principal investigator Dong Ahn.

Read more.

ECP Issues PathForward Final Assessment Report

The PathForward element of Exascale Computing Project (ECP) was established to prepare the US industry for exascale system procurements and generally improve US competitiveness in the worldwide computing market.

A competitive PathForward RFP (Request for Proposals) was released in 2016, seeking responses to improve application performance and developer productivity while maximizing the energy efficiency and reliability of an exascale system. Following a response review process, six responses were selected for award, and contract negotiations began.

All six selected responses successfully led to contracts that were awarded and announced in June 2017.

The quantity and breadth of the work and milestones across six contracts presented a challenge for ECP to ensure the work was properly reviewed and feedback provided to the contract awardees in a timely fashion.

The PathForward program engaged working groups for each contract, with members from each of the six core ECP DOE national laboratories tasked with ensuring the reviews were performed effectively by subject matter experts from across those institutions.

ECP has issued a report that serves as a companion to the deliverable for the ECP milestone PM-HI-1040, Assess PathForward Impact Against Exascale Hardware Challenges.

The report summarizes the final status of each PathForward project, describes progress achieved against PathForward contract milestones, and includes a final assessment of each vendor’s progress on key exascale challenges.

This companion report details the results of the PathForward research to the extent possible without disclosing proprietary information and the impact of products and US exascale systems.

Additionally, it captures lessons learned to inform future projects in general and in high-performance computing in particular.

Read more and obtain the report.

ECP's 2021 Application Development Milestone Report Summarizes Subproject Status

A milestone report that summarizes the status of all 30 ECP Application Development subprojects at the end of FY20 may be obtained through ECP's website.

The report contains not only an accurate snapshot of each subproject’s status but also represents an unprecedentedly broad account of experiences in porting large scientific applications to next-generation high-performance computing architectures.

Read more and obtain the report.

ECP Brings a Host of Hardware-Accelerated and GPU-Friendly Optimizations to the MPICH Library

Message Passing Interface (MPI) has been the communications backbone for distributed high-performance computing (HPC) scientific applications since its introduction in the 1990s. MPI is still essential to scientific computing because modern HPC workloads are too demanding for any single computational node to handle alone.

Instead, scientific applications are distributed across many computational nodes. It is this combined distributed computational throughput—enabled by MPI—that provides the petascale and soon-to-be-exascale performance of the world’s largest supercomputers.

The MPICH library is one of the most popular implementations of MPI. Primarily developed at Argonne National Laboratory (ANL) with contributions from external collaborators, MPICH has adhered to the idea of delivering a high-performance MPI library by working closely with vendors in which the MPICH software provides the link between the MPI interface used by applications programmers and vendors who provide low-level hardware acceleration for their network devices.

Yanfei Guo, the principal investigator of the Exascale MPI project in the Exascale Computing Project (ECP) and assistant computer scientist at ANL, is following this tradition. According to Guo, “The ECP MPICH team is working closely with vendors to add general optimizations—optimizations that will work in all situations—to speed MPICH and leverage the capabilities of accelerators, such as GPUs.”

Read more.

A Roundup of Recently Published Content

New on ECP's YouTube Channel

"The IDEAS-ECP Webinar: Using the 'Wrong' Programming Approach on Leadership Computing Facility Systems" video
This webinar considers the impact of using a "wrong" programming approach for a given system. The presenter reviews a few of these wrong programming approaches for current and near-term future systems and discusses specific software packages that enable such approaches, along with lessons learned.

The ECP website offers a Library section with access to pages containing a searchable list of journal and conference proceedings publications of ECP-funded research, technical highlight articles, links to technical reports, and the Let’s Talk Exascale podcast episode list.

The Library section offers a searchable listing of publications dating back to 2017 that contains about 450 entries to date.

Featured Publication Summaries

Featured summaries in the Library section capture the essence of recent especially notable ECP publications. New summaries are added regularly. The following is the latest summary post.

Technical Highlights

The Technical Highlights section of ECP's online Library continues to grow. The following is the latest technical highlight article post.

ECP Brings a Host of Hardware-Accelerated and GPU-Friendly Optimizations of the MPICH Library

ECP provides a robust developer training and productivity program. The objective is to keep application and software team members, staff, and other stakeholders abreast of emerging technologies. This effort is a close collaboration of US Department of Energy facilities, vendors, and the ECP community.

Upcoming Training Events


Not a Subscriber to ECP Updates?

If you received this email because one of your very thoughtful colleagues passed it along to you, you can sign up for your own subscription to ECP updates!
Copyright © 2022 Exascale Computing Project, all rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Email Marketing Powered by Mailchimp