Analyzing a Five-Year Failure Record of a Leadership-Class Supercomputer Conference Paper October, 2019
US Department of Energy, Office of Science High Performance Computing Facility Operational Assessment 2018 Oak Ridge Leadership Computing Facility ORNL Report May, 2019
Balancing Performance and Portability with Containers in HPC: An OpenSHMEM Example Conference Paper January, 2019
Comparative I/O Workload Characterization of Two Leadership Class Storage Clusters... Conference Paper November, 2015
Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility Conference Paper November, 2015
Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems... Conference Paper June, 2015
Experience with GPUs on the Titan Supercomputer from a Reliability, Performance and Power Perspective Conference Paper May, 2015