(Oak Ridge National Laboratory: Oak Ridge, TN) -- In partnership with the National Oceanic and Atmospheric Administration (NOAA), Oak Ridge National Laboratory (ORNL) is launching a supercomputer dedicated to climate science research. The new system is the fifth supercomputer to be installed and run by the National Climate-Computing Research Center (NCRC) at ORNL.
ADVERTISEMENT |
The NCRC was established in 2009 as part of a strategic partnership between NOAA and the U.S. Department of Energy, and is responsible for procuring, installing, testing, and operating several supercomputers dedicated to climate modeling and simulations. The partnership’s goal is to increase NOAA’s climate modeling capabilities to further critical climate research. To that end, the NCRC has installed a series of increasingly powerful computers since 2010, each formally named “Gaea.” The latest system, also referred to as C5, is an HPE Cray machine with more than 10 petaflops (or 10 million billion calculations per second) of peak theoretical performance—almost double the power of the two previous systems combined.
C5 is one of three NOAA computers operating at ORNL. Typically, the NCRC only operates two supercomputers at a time for NOAA users. They are replaced on a rotating schedule to provide NOAA users with uninterrupted access to more powerful machines while also minimizing operational and maintenance costs.
“The power efficiency, cooling efficiency, and CPU power all increase significantly over time,” explains Paul Peltz, ORNL technical lead for Gaea. “We can replace all of the computational power of C3 with a single cabinet of C5, which has eight cabinets total.”
Originally scheduled to arrive in fall 2021, C5’s delivery and installation was delayed several months by supply chain issues. “It was a unique period of time that made purchasing a system of this size very challenging,” says Chris Fuson, NOAA program manager at ORNL.
When the hardware arrived and C5 was assembled in summer 2022, the team began the testing and acceptance process, a standard but critical phase that pushes the system in order to test its reliability, stability, and performance under various workloads. This work was led by Verónica Melesse Vergara, leader of the System Acceptance and User Environment group. Working with her were ORNL staff members Tom Papatheodore, Dan Dietz, and Nick Hagerty. Initial tests, which find faulty hardware and confirm basic functionality, were followed by benchmarks and applications provided by NOAA that were representative of actual workloads.
“We load up the system with the application benchmarks and ensure the system can run with the expected performance,” says Dietz, a high-performance computing, or HPC, engineer at ORNL. “We slowly loaded up the number of copies of each benchmark running at once, easing on the gas to ensure the system doesn’t run into any issues under heavy load. We want to see consistent performance among all copies of the benchmark.”
“Finding problems and fixing them before we open the system to users is rewarding,” says Vergara. “If we did our jobs correctly, then users will be able to run without major challenges; so often they are unaware of the bugs that were fixed before they had access.”
When Gaea goes into full production and is open to NOAA users, the ORNL team will take a step back and focus on system maintenance while preparations for the next system begin.
“ORNL is a custodian of the machine for NOAA,” says Peltz. “We provide strong HPC knowledge and top-class facilities, and we invest heavily in our ability to house these machines in a secure manner. Those are things that NOAA doesn’t have to worry about. This interoperability between agencies is great.”
University of Tennessee-Battelle manages ORNL for the U.S. Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, visit energy.gov/science.
Add new comment