Paper ID | MLSP-33.1 |
Paper Title |
IMPROVED STEP-SIZE SCHEDULES FOR NOISY GRADIENT METHODS |
Authors |
Sarit Khirirat, Xiaoyu Wang, KTH Royal Institute of Technology, Sweden; Sindri Magnússon, Stockholm University, Sweden; Mikael Johansson, KTH Royal Institute of Technology, Sweden |
Session | MLSP-33: Optimization Methods |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 15:30 - 16:15 |
Presentation Time: | Thursday, 10 June, 15:30 - 16:15 |
Presentation |
Poster
|
Topic |
Machine Learning for Signal Processing: [MLR-DFED] Distributed/Federated learning |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order methods and compressed gradient methods. For such methods to converge toward a global optimum, it is intuitive to use large step-sizes in the initial iterations when the noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient methods, but similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can be indeed applied for a broad class of noisy gradient methods. Unlike previous works, our analysis framework shows that such step-size schedules enable these methods to enjoy an optimal $\mathcal{O}(1/k)$ rate. We exemplify our results on zeroth-order methods and stochastic compression methods. Our experiments validate fast convergence of these methods with the step decay schedules. |