Gordon Bell Prize finalist code, Menndl, creates a neural network that performs image analysis on par with human experts
Using the Oak Ridge Leadership Computing Facility鈥檚 (OLCF鈥檚) new leadership-class supercomputer, the , a team from the 鈥檚 (DOE鈥檚) (ORNL) demonstrated the ability to generate intelligent software that could revolutionize how scientists manipulate materials at the atomic scale.
The team鈥檚 code, called Multinode Evolutionary Neural Networks for Deep Learning (MENNDL), reached a sustained performance of 152.5 petaflops using mixed precision calculations. Consequently, the project earned a finalist nomination for the Association for Computing Machinery鈥檚 Gordon Bell Prize, awarded each year to recognize outstanding achievement in high-performance computing.
MENNDL automatically creates artificial neural networks鈥攃omputational systems that loosely mimic the human brain鈥攖hat can tease important information out of scientific datasets. The algorithm has been applied successfully to medical research, satellite images, high-energy physics data, and neutrino research.
A team led by ORNL鈥檚 Robert Patton recently used MENNDL and the OLCF鈥檚 Summit supercomputer to automatically create a deep-learning network specifically tuned for data produced by advanced microscopes called scanning transmission electron microscopes (STEMs). The resulting network is capable of reducing the human effort needed to analyze STEM images from months to hours.
Such a significant reduction could have huge implications. With advanced microscopes capable of producing hundreds of images per day, real-time feedback from optimized algorithms generated by MENNDL could dramatically accelerate image processing and pave the way for new scientific discoveries in materials science, among other domains. The technology could eventually mature to the point where scientists gain the ability to fabricate materials at the atomic level.
鈥淲hat we鈥檙e essentially doing is going through a big open field trying to find these golden nuggets,鈥� Patton said. 鈥淩unning MENNDL on Summit helps us increase our search space so that we can cover thousands of these networks at the same time. It helps us munch through the data faster.鈥�
A network for the nanoscale
Mapping the insides of solar cells, semiconductors, batteries, and biological cells requires a big microscope鈥攁nd not just any microscope. STEM microscopes are specialized to tackle these kinds of tasks, and their ability to zero in on the nanoscale and atomic-scale structure of materials can help scientists explore their properties and behaviors.
To get a look at materials on the atomic scale, STEM microscopes employ a beam of negatively charged particles called electrons that pass through a sample and form images of that sample. Electrons that lose energy as they pass through provide information about the structure of the material. Therefore, STEM microscopes are useful for research involving materials such as the semi-metal graphene and in fields such as quantum computing.
The capabilities of STEM microscopes far surpass those of optical microscopes, but understanding and analyzing defects at the nanoscale can be a challenge. Scientists have a difficult time deciphering microscopy images with high levels of noise or missing structural elements, and they have yet to figure out how to automatically extract structural information from these images. MENNDL, though, holds promise for such tasks.
Using MENNDL, the team trained a network to recognize the microscopic defects in one frame of a STEM 鈥渕ovie,鈥� a series of high-resolution images the microscope produces in succession. In this case, the movie showed defects in a single layer of molybdenum-doped tungsten disulphide鈥攁 2D material that has applications in solar cells鈥攗nder 100 kV electron beam irradiation.
The team took the single frame and divided it into tiles. Then, the network was trained using the tiles as examples of which patterns to look for. Because electron microscopy images contain many repeating elements, dividing an image into tiles offers the network a number of training examples while limiting redundant examples that can eat up computational time. This helps the network more efficiently classify the pixels in the whole image.
MENNDL employs an algorithm to find the network with the most suitable topology (number and type of layers) for classifying these pixels, based on the performance of previous evolving networks.
鈥淭hink of a neural network like a sandwich,鈥� Patton said. 鈥淵ou have to figure out which layers are stacked in what order, and different sandwiches are going to have different layers. That鈥檚 our biggest struggle in deep learning right now鈥攚hen you come to a new dataset that no one has touched before, you don鈥檛 know what those layers should be.鈥�
As an evolutionary algorithm, MENNDL builds on the 鈥渟urvival of the fittest鈥� principle, in that the neural networks are evolving to 鈥渟urvive.鈥� Initially, the network has an almost infinite number of parameter combinations that are possible鈥攂ut that doesn鈥檛 mean they鈥檙e all good at the assigned task. Neural networks that are better at correctly identifying defects in the data may reproduce new combinations of networks that have an increasing chance of being correct as they evolve, whereas the poor performers are excluded from future generations of networks.
Some of the parameters include the size of the kernels鈥攖he search filters of the information鈥攁nd stride, or how much these filters overlap one another to comb more meticulously through the data.
MENNDL analyzed more than 2 million networks over the course of the code鈥檚 4-hour run and eventually provided the team with a rough sketch of the network design that would perform at least as well as a human domain expert would perform. The team used this final, superior network to analyze two subsequent frames in the movie and confirmed the network鈥檚 unprecedented ability to detect the defects.
鈥淭his is data that had not been looked at before,鈥� Patton said. 鈥淭he materials scientists whose data we used were previously doing what many others in the deep learning community are currently doing: trying to manually design a neural network. That鈥檚 where we came in.鈥�
Populating all of Summit
Not only does MENNDL parallelize well to large HPC architectures, but it also has been developed to allow a rolling population of neural networks to exist on the machine.
鈥淓ach GPU evaluates one network at a time, but as soon as a GPU is done, we hand it another network,鈥� Patton said. 鈥淭he algorithm waits until it has a sufficient number of results back and then runs this evolutionary process where it combines some of these networks and throws out others.鈥�
MENNDL also uses Summit鈥檚 burst buffer memory, which allows for the storage of data locally鈥攐n the nodes themselves鈥攔ather than on the file system. This minimizes data transfer and enhances code performance by increasing GPU utilization.
In 2017, MENNDL ran for 24 hours and analyzed two networks per GPU per hour on the OLCF鈥檚 27-petaflop Cray XK7 Titan supercomputer. On Summit, MENNDL analyzed approximately 37 networks per GPU per hour. The team estimates that the code will achieve a peak performance of 167 petaflops, or 167 quadrillion calculations per second, running on all 27,648 of Summit鈥檚 NVIDIA V100 GPUs.
鈥淭here were some unknowns, in regard to running on such a new system. It鈥檚 like test driving a prototype car on a racetrack, in a sense,鈥� Patton said. 鈥淏ut running on a machine that hasn鈥檛 even been accepted yet and getting this kind of result in the end鈥攊t鈥檚 pretty stunning.鈥�
Now that MENNDL has scaled to Summit, the team members will continue to apply the algorithm to other scientific domains and expand on their work in microscopy.
鈥淲e are going to be able to do things on Summit in minutes that would have taken hours on Titan,鈥� Patton said. 鈥淭ools like MENNDL give experimentalists hope, because the astronomical amounts of data they鈥檝e generated are now potentially useful to discover something new.鈥�
The OLCF is a DOE User Facility located at ORNL.
This material is based upon work supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Robinson Pino, program manager, under contract number DE-AC05-00OR22725.
This research used resources of the OLCF, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
ORNL is managed by UT-Battelle for the Department of Energy鈥檚 Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE鈥檚 Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit .