Steffen Schotthoefer

Mathematics of Low-Rank Training and Fine-Tuning of Neural Networks

Steffen Schotthoefer

Abstract:

Low-rank adaptation (LoRA) has become the de-facto state-of-the-art method for parameter efficient fine-tuning of large-scale, pre-trained neural networks.  Similarly, low-rank compression of pre-trained networks has become a widely adopted technique to reduce the parameter count of networks for fast inference on resource constraint devices.  The idea of low-rank methods is based upon the assumption that the weight matrices of overparametrized neural networks are of low-rank.  Thus, a factorization of the weight layers based on truncated singular value decompositions can be employed to reduce the memory footprint of the network.  However, LoRA and its extensions face several challenges in practice, including the need for rank adaptivity, robustness, and computational efficiency during the fine-tuning process.  In this talk, Dr. Schotthoefer investigates mathematical concepts of low-rank training and uses the gained insights to design efficient and robust low-rank training algorithms.

                                                                                        

Speaker’s Bio:

Dr. Steffen Schotthoefer is the current Householder Fellow in the Mathematics in Computation Section at the Oak Ridge National Laboratory (ORNL), affiliated with the Multiscale Methods and Dynamics Group.  Steffen's work centers on creating efficient numerical methods for training and fine-tuning artificial intelligence models in environments with limited resources and at large scales.  He investigates low-rank methods for model compression to minimize the computational cost of neural network training and inference.  In addition, Steffen develops neural network-based surrogate models for scientific domains such as radiation transport and plasma dynamics.  His research aims to tackle the challenges posed by memory and communication bottlenecks in large-scale simulations.  Prior to joining ORNL, Steffen completed his Ph.D. in Applied Mathematics at Karlsruhe Institute of Technology, Germany, focusing on neural network-based surrogate modeling for radiation transport.  During his doctoral studies, he devised numerical methods for the simulation of kinetic partial differential equations and neural network training, establishing the foundation for his current research.

January 23
3:15pm - 4:15pm
C101 5600
SHARE