In this paper, we study Discretized Neural Networks (DNNs) composed of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function during training. Most training-based DNNs in such scenarios employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the use of STE introduces the problem of gradient mismatch, arising from perturbations in the approximated gradient. To address this problem, this paper reveals that this mismatch can be interpreted as a metric perturbation in a Riemannian manifold, viewed through the lens of duality theory. Building on information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs, providing a background for addressing perturbations. By introducing a partial differential equation on metrics, i.e., the Ricci flow, we establish the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. In contrast to previous perturbation theories with convergence rates in fractional powers, the metric perturbation under the Ricci flow exhibits exponential decay in the LNE manifold. Experimental results across various datasets demonstrate that our method achieves superior and more stable performance for DNNs compared to other representative training-based methods.
翻译:本文研究由低精度权重和激活值组成的离散化神经网络(DNN),该类网络在训练过程中由于非可微离散函数会导致梯度为零或无穷大。在此类场景下,大多数基于训练的DNN采用标准直通估计器(STE)来近似针对离散值的梯度。然而,STE的使用会引入梯度失配问题,该问题源于近似梯度中的扰动。为应对这一挑战,本文通过对偶理论的视角揭示,该失配可被解释为黎曼流形中的度量扰动。基于信息几何理论,我们为DNN构建了线性近似欧几里得(LNE)流形,为处理扰动提供了理论背景。通过引入度量上的偏微分方程——里奇流,我们建立了LNE度量在$L^2$范数扰动下的动力学稳定性与收敛性。与以往分数幂收敛率的扰动理论不同,LNE流形中的里奇流度量扰动呈指数级衰减。在多个数据集上的实验结果表明,相较于其他代表性基于训练的方法,我们的方法在DNN上实现了更优且更稳定的性能。