GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability Conference Paper November, 2020
US Department of Energy, Office of Science High Performance Computing Facility Operational Assessment 2019 Oak Ridge Leadership Computing Facility ORNL Report June, 2020
Comparative I/O Workload Characterization of Two Leadership Class Storage Clusters... Conference Paper November, 2015
Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility Conference Paper November, 2015
Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems... Conference Paper June, 2015
Experience with GPUs on the Titan Supercomputer from a Reliability, Performance and Power Perspective Conference Paper May, 2015