SEEr
a Scalable, Energy-Efficient HPC environment for AI-enabled Science
AI-enabled science, where advanced machine-learning technologies are used for surrogate models, auto tuning, and in situ data analysis, is quickly being adopted in science and engineering for tackling complex and challenging computational problems. The wide adoption of heterogeneous systems embedded with different types of processing devices (CPUs, GPUs, and AI accelerators) further complicates the execution of AI-enabled science on supercomputers. The research for AI-enabled simulations on heterogeneous systems is far from sufficient.
The long-term research vision is to develop SEEr, a Scalable, Energy-Efficient HPC environment for scaling up and accelerating AI-enabled science for scientific discovery. This PPoSS planning project explores fundamental questions to realize the research vision. The team focuses on scalable surrogate models for an incompressible computational fluid dynamics application using OpenFOAM, cost models for this application on heterogeneous resources, dynamic task mapping for efficient execution, and performance and power monitoring and characterization to explore tradeoffs among performance, scalability, and energy efficiency on a state-of-the-art heterogeneous testbed at ALCF. The unified team of researchers tackles the problem in a cross-layer manner, focusing on the synergies among application algorithms, programming languages and compilers, runtime systems, and high-performance computing.
Faculty:
- Zhiling Lan, Stefan Muller, Romit Maulik (Illinois Tech)
- Valerie Taylor, Xingfu Wu (UChicago)
- Mike Papka (NIU)
Students:
- Melanie Cornelius (PhD)
- Hunter Negron (BS/MS)
- Hannah Greenblatt (BS/MS)
- Pranjal Naik (MS)
Major project events:
- Bi-weekly zoom meeting 4-5pm on Fridays
- In-person PI meeting at Argonne on May 10th, 2022.
- In-person research meeting with climate modeling group at Argonne on August 5th, 2022.
- In-person extended PI meeting at Illinois Tech on Dec. 2nd, 2022.
Software Tools:
- PythonFOAM: In-situ data analyses with OpenFOAM and Python at Argonne Link
- Mantis, a unified performance and power profiling interface on applications running on heterogeneous systems Link
- Two AI-enabled applications (mini-app and PythonFOAM) explored in the SEEr planning project. The codes and run scrips for the heterogeneous CPU-GPU systems at Argonne Leadership Computing Facility (ALCF) are available at the team’s GitHub repository Link
Papers and reports:
- H. Greenblatt, H. Negron, M. Cornelius, S. Muller, R. Maulik, X. Wu, M. Papka, and V. Taylor, “Performance Characterization of AI-enabled Scientific Applications”, Technical report, August 2022. [pdf]
- H. Greenblatt, “CS597 Report: Study of PythonFoam on ThetaGPU”, Technical report, Dec 2022. [pdf]
- H. Negron and P. Naik, “CS597 Report: Mini-App Analysis on Polaris and ThetaGPU”, Technical report, Dec 2022. [pdf]
- M. Cornelius, H. Greenblatt, and Z. Lan, “Mantis: A Unified Performance and Power Profiling Interface on Heterogeneous Systems”, Technical report, August 2022. [pdf]
- H. Greenblatt, “CS597 Report: PythonFoam Benchmarking”, Technical report, May 2022. [pdf]
- H. Negron and Z. Zheng, “CS597 Report: Mini-app Benchmarking”, Technical report, May 2022. [pdf]
- X. Wu, V. Taylor, and Z. Lan, Performance and Energy Improvement of the ECP Proxy App SW4lite under Various Workloads, SC2021 Workshop on Memory-Centric High Performance Computing (MCHPC’21), Nov. 2021. [pdf]
- X. Wu, V. Taylor, and Z. Lan, Performance and Power Modeling and Prediction Using MuMMI and Ten Machine Learning Methods, Concurrency and Computation Practice and Experience, August 2022, https://doi.org/10.1002/cpe.7254. [pdf]
Acknowledgement:
This project is supported by the US National Science Foundation (CCF 2119294, 2119203, 2119056). Note: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.