Stochastic learning dynamics based on Langevin or Levy stochastic differential equations (SDEs) in deep neural networks control the variance of noise by varying the size of the mini-batch or directly those of injecting noise. Since the noise variance affects the approximation performance, the design of the additive noise is significant in SDE-based learning and practical implementation. In this paper, we propose an alternative stochastic descent learning equation based on quantized optimization for non-convex objective functions, adopting a stochastic analysis perspective. The proposed method employs a quantized optimization approach that utilizes Langevin SDE dynamics, allowing for controllable noise with an identical distribution without the need for additive noise or adjusting the mini-batch size. Numerical experiments demonstrate the effectiveness of the proposed algorithm on vanilla convolution neural network(CNN) models and the ResNet-50 architecture across various data sets. Furthermore, we provide a simple PyTorch implementation of the proposed algorithm.
翻译:深度神经网络中基于朗之万或莱维随机微分方程的随机学习动力学,通过调整小批量大小或直接控制注入噪声的方差来调控噪声强度。由于噪声方差直接影响近似性能,因此在基于随机微分方程的学习与实践中,加性噪声的设计至关重要。本文从随机分析视角出发,提出一种基于量化优化的非凸目标函数替代性随机梯度下降学习方程。该方法采用基于朗之万随机微分方程动力学的量化优化策略,无需引入加性噪声或调整小批量大小,即可实现具有相同分布的噪声可控性。数值实验在普通卷积神经网络模型和ResNet-50架构上验证了所提算法的有效性,并提供了基于PyTorch的简易算法实现。