Novel Stochastic Solvers for Image Classification, Generation, and Further Explorations

Author: Neel Mishra 2020701017
Date: 2023-11-07
Report no: IIIT/TH/2023/158
Advisor:Pawan Kumar

Abstract

This thesis presents research on deep learning optimization, emphasizing three separate yet related topics: adaptive learning rates, first-order optimization for generative adversarial networks (GANs), and the effects of label smoothing. Firstly, we propose a novel approach for obtaining adaptive learning rates in gradientbased descent methods for classification tasks. Departing from traditional methods that rely on decayed expectations of gradient-based terms, our approach leverages the angle between the current gradient and a new gradient computed from the orthogonal direction. By incorporating angle history, we determine adaptive learning rates that lead to superior accuracy compared to existing state-of-the-art optimizers. We provide empirical evidence of convergence and evaluate our approach on diverse benchmark datasets, employing prominent image classification architectures. Secondly, we introduce a groundbreaking first-order optimization method tailored specifically for training GANs. Our method builds upon the Gauss-Newton method, approximating the min-max Hessian, and utilizes the Sherman-Morrison inversion formula to calculate the inverse. Operating as a fixed-point method that ensures necessary contraction, our approach produces high-fidelity images with enhanced diversity across multiple datasets. Notably, it outperforms state-of-the-art second-order methods, including achieving the highest inception score for CIFAR10. Additionally, our method demonstrates comparable execution times to first-order min-max methods. Furthermore, we investigate the effects of label smoothing on GAN training, examining various optimizer variants and learning rates. Our research reveals that employing label smoothing with a high learning rate and the CGD optimizer yields results surpassing the quality attained by using ADAM with the same learning rate. Importantly, we establish that label smoothing plays a vital role, as its absence fails to generate comparable results. We also explore the impact of architectural changes on the generator’s conditioning, providing valuable insights into the factors influencing GAN performance Our research advances the deep learning optimization field by delving into these interconnected areas. We present novel methodologies for adaptive learning rates, first-order optimization for GANs, and the importance of label smoothing. These advancements offer improved accuracy in classification tasks, enhanced image generation quality, and a deeper understanding of the nuances of GAN training.

Full thesis: pdf

Centre for Security, Theory and Algorithms

IIIT Hyderabad Publications

Novel Stochastic Solvers for Image Classification, Generation, and Further Explorations

Abstract