In this paper, we consider Discretized Neural Networks (DNNs) consisting of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function in the training process. In this case, most training-based DNNs employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the STE gives rise to the problem of gradient mismatch, due to the perturbations of the approximated gradient. To address this problem, this paper reveals that this mismatch can be viewed as a metric perturbation in a Riemannian manifold through the lens of duality theory. Further, on the basis of the information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs as a background to deal with perturbations. By introducing a partial differential equation on metrics, i.e., the Ricci flow, we prove the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. Unlike the previous perturbation theory whose convergence rate is the fractional powers, the metric perturbation under the Ricci flow can be exponentially decayed in the LNE manifold. The experimental results on various datasets demonstrate that our method achieves better and more stable performance for DNNs than other representative training-based methods.
翻译:本文考虑由低精度权重和激活函数组成的离散化神经网络(DNN),其在训练过程中因不可微离散函数而面临梯度为零或无穷大的问题。为此,大多数基于训练的DNN采用标准直通估计器(STE)来近似离散值的梯度。然而,STE因近似梯度的扰动会导致梯度失配问题。为解决该问题,本文通过对偶理论视角揭示:这种失配可被视为黎曼流形中的度量扰动。进一步,基于信息几何理论,我们为DNN构建了线性近欧几里得(LNE)流形作为处理扰动的背景。通过引入度量上的偏微分方程——里奇流,我们证明了LNE度量在$L^2$范数扰动下的动力学稳定性与收敛性。与先前收敛率为分数阶的扰动理论不同,在里奇流作用下LNE流形中的度量扰动可实现指数衰减。在多种数据集上的实验结果表明,与其他代表性基于训练的方法相比,我们的方法能为DNN带来更优且更稳定的性能。