Stochastic learning dynamics based on Langevin or Levy stochastic differential equations (SDEs) in deep neural networks control the variance of noise by varying the size of the mini-batch or directly those of injecting noise. Since the noise variance affects the approximation performance, the design of the additive noise is significant in SDE-based learning and practical implementation. In this paper, we propose an alternative stochastic descent learning equation based on quantized optimization for non-convex objective functions, adopting a stochastic analysis perspective. The proposed method employs a quantized optimization approach that utilizes Langevin SDE dynamics, allowing for controllable noise with an identical distribution without the need for additive noise or adjusting the mini-batch size. Numerical experiments demonstrate the effectiveness of the proposed algorithm on vanilla convolution neural network(CNN) models and the ResNet-50 architecture across various data sets. Furthermore, we provide a simple PyTorch implementation of the proposed algorithm.
翻译:基于朗之万或莱维随机微分方程(SDE)的深度神经网络随机学习动力学,通过改变小批量大小或直接调整注入噪声的强度来控制噪声方差。由于噪声方差影响逼近性能,因此在基于SDE的学习及其实际实现中,加性噪声的设计至关重要。本文从随机分析视角出发,针对非凸目标函数提出一种基于量化优化的替代随机下降学习方程。该方法采用基于朗之万SDE动力学的量化优化策略,能够在不引入加性噪声或调整小批量大小的条件下,实现具有相同分布的可控噪声。数值实验在普通卷积神经网络(CNN)模型及ResNet-50架构上验证了所提算法在多种数据集上的有效性。此外,我们提供了该算法的简易PyTorch实现。