In this paper, we consider Discretized Neural Networks (DNNs) consisting of low-precision weights and activations, which suffer from either infinite or zero gradients caused by the non-differentiable discrete function in the training process. In this case, most training-based DNNs use the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the STE will cause the problem of gradient mismatch, which implies that the approximated gradient is with perturbations. We propose an analysis that this mismatch can be viewed as a metric perturbation in a Riemannian manifold through the lens of duality theory. To address this problem, based on the information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs as a background to deal with perturbations. By introducing a partial differential equation on metrics, the Ricci flow, we prove the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. And unlike the previous perturbation theory which gives the rate of convergence is the fractional powers, we yield the metric perturbation under the Ricci flow can be exponentially decayed in the LNE manifold. The experimental results on various datasets demonstrate that our method achieves better and more stable performance for DNNs than other representative training-based methods.
翻译:本文研究由低精度权重和激活组成的离散化神经网络(DNN),其在训练过程中因不可微离散函数导致梯度出现无穷大或零值。针对此类情况,多数基于训练的DNN采用标准直通估计器(STE)来近似关于离散值的梯度。然而,STE会导致梯度失配问题,即近似梯度存在扰动。我们提出一种分析框架:通过对偶理论视角,可将该失配视为黎曼流形上的度量扰动。为解决此问题,基于信息几何学,我们为DNN构建了线性近欧几里得(LNE)流形作为处理扰动的背景空间。通过引入度量的偏微分方程——里奇流,我们证明了LNE度量在$L^2$范数扰动下的动态稳定性与收敛性。与以往仅给出分数幂收敛率的扰动理论不同,我们证明了LNE流形中里奇流作用下的度量扰动可实现指数衰减。在多种数据集上的实验结果表明,相较于其他代表性基于训练的方法,本方法能使DNN获得更优且更稳定的性能。