In this paper, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks. Furthermore, we develop stochastic subgradient methods that incorporate gradient clipping techniques for training nonsmooth neural networks with heavy-tailed noise. Through our framework, we show that our proposed methods converge even when the evaluation noises are only assumed to be integrable. Extensive numerical experiments demonstrate the high efficiency and robustness of our proposed methods.
翻译:本文全面研究了Adam类方法在非光滑优化中的收敛性质,特别是在非光滑神经网络训练中的应用。我们引入了一种新颖的双时间尺度框架,该框架采用双时间尺度更新机制,并在温和假设下证明了其收敛性质。所提出的框架涵盖了多种流行的Adam类方法,为这些方法在训练非光滑神经网络时提供了收敛保证。此外,我们开发了结合梯度裁剪技术的随机次梯度方法,用于训练含有重尾噪声的非光滑神经网络。通过该框架,我们证明了即便评估噪声仅假设为可积时,所提出的方法仍能收敛。大量数值实验表明,所提方法具有较高的效率和鲁棒性。