Collaborative Research: Robust Acceleration and Preconditioning Methods for Data-Related Applications: Theory and Practice

Project: Research project

Project Details

Description

In many disciplines of science and engineering one often encounters sequences of numbers or vectors or other mathematical objects. A common goal in these situations is to obtain the limit of the sequence inexpensively. As a simple example, there are several ways to generate a sequence of numbers that converge to the number pi and some sequences will reach the limit pi rather quickly. In some cases, it may be possible to modify the original method that produced the sequence to obtain a faster converging one. However, this is not always possible or cost-effective because the process by which the sequence is produced is not explicit or it may be too cumbersome for this approach to be practical. Another common solution is to transform the sequence, by 'accelerating it'. This usually entails combining the terms of the sequence to produce new entries that reach the limit faster. So far, acceleration methods were developed by mathematicians and (quantum) physicists to deal with a wide range of problems from the physical sciences. The primary goal of this project is to study such acceleration methods and to adapt them to the modern era by focusing on topics related to machine learning and, more generally, data sciences. This collaborative research project between researchers at Emory University and the University of Minnesota, will develop and study theoretically a number of robust acceleration algorithms with an emphasis on problems that stem from data-related applications, as well as train graduate students in this field of study. The need to accelerate numerical sequences of various types has frequently been felt across a wide range of disciplines and it has been addressed by many researchers for quite some time. In the past, such acceleration or extrapolation schemes were targeted mainly toward sequences of vectors that arise from physical simulations, e.g., the sequence of potentials generated by the Self Consistent Field (SCF) iterations in quantum physics. In recent years, the rapid expansion of machine learning methodologies across a great variety of disciplines has generated new demand for algorithms to accelerate sequences of various types. However, the new type of sequences encountered in these applications differ in fundamental ways from their analogues in physical simulations. In contrast with the common setting required in, e.g., quantum physics, calculations in machine learning are often performed in single or half precision instead of double precision. Furthermore, in neural networks these sequences tend to be very irregular because they originate from stochastic gradient approaches. The collaborative team from Emory University and the University of Minnesota, will develop and study theoretically a number of robust acceleration algorithms with an emphasis on their application to irregular sequences such as those encountered in data-related applications. The investigating team will study a number of strategies for improving the robustness of standard acceleration schemes, such as Anderson mixing, or the epsilon algorithm. A number of recently advocated second-order methods, based on (so-called) momentum ideas, have been shown to be of great help in accelerating standard stochastic gradient descent methods in Deep Learning. The investigators will add to these schemes another method of the same class that is grounded in Chebyshev acceleration. One advantage of the Chebyshev-based scheme is that it is fairly easy to study theoretically in part because it is well understood for linear problems. In a second research direction, the investigating team will study acceleration methods in the specific context of machine learning tasks. For example, inspired by recent advances in parameter averaging and variance reduction schemes they will explore the application of classical extrapolation methods to stochastic gradient sequences as an adaptive extrapolation procedure. Experiments show that parameter averaging is key to the success of acceleration procedures. Just as important is the selection of the vectors to accelerate. Along the same lines, robust acceleration schemes will be designed and studied for sequences hampered by low precision arithmetic.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date9/1/228/31/25

Funding

  • National Science Foundation: $200,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.