In this paper, we study Discretized Neural Networks (DNNs) composed of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function during training. Most training-based DNNs in such scenarios employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the use of STE introduces the problem of gradient mismatch, arising from perturbations in the approximated gradient. To address this problem, this paper reveals that this mismatch can be interpreted as a metric perturbation in a Riemannian manifold, viewed through the lens of duality theory. Building on information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs, providing a background for addressing perturbations. By introducing a partial differential equation on metrics, i.e., the Ricci flow, we establish the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. In contrast to previous perturbation theories with convergence rates in fractional powers, the metric perturbation under the Ricci flow exhibits exponential decay in the LNE manifold. Experimental results across various datasets demonstrate that our method achieves superior and more stable performance for DNNs compared to other representative training-based methods.
翻译:本文研究了由低精度权重和激活函数构成的离散化神经网络,这类网络在训练过程中由于不可微的离散函数而面临梯度无限或为零的问题。在此类场景下,大多数基于训练的离散化神经网络采用标准直通估计器来近似关于离散值的梯度。然而,STE的使用引入了梯度失配问题,该问题源于近似梯度的扰动。为解决此问题,本文揭示了这种失配可以通过对偶理论的视角,解释为黎曼流形中的度量扰动。基于信息几何,我们为离散化神经网络构建了线性近欧几里得流形,为处理扰动提供了背景框架。通过引入关于度量的偏微分方程——即里奇流,我们建立了LNE度量在$L^2$范数扰动下的动态稳定性与收敛性。与先前收敛速度为分数阶幂的扰动理论不同,在LNE流形中,里奇流下的度量扰动呈现指数衰减。在不同数据集上的实验结果表明,与其他代表性的基于训练的方法相比,我们的方法为离散化神经网络实现了更优且更稳定的性能。