The discontinuous operations inherent in quantization and sparsification introduce obstacles to backpropagation. This is particularly challenging when training deep neural networks in ultra-low precision and sparse regimes. We propose a novel, robust, and universal solution: a denoising affine transform that stabilizes training under these challenging conditions. By formulating quantization and sparsification as perturbations during training, we derive a perturbation-resilient approach based on ridge regression. Our solution employs a piecewise constant backbone model to ensure a performance lower bound and features an inherent noise reduction mechanism to mitigate perturbation-induced corruption. This formulation allows existing models to be trained at arbitrarily low precision and sparsity levels with off-the-shelf recipes. Furthermore, our method provides a novel perspective on training temporal binary neural networks, contributing to ongoing efforts to narrow the gap between artificial and biological neural networks.
翻译:量化与稀疏化中固有的不连续操作给反向传播带来了障碍。这在超低精度和稀疏条件下训练深度神经网络时尤为困难。我们提出了一种新颖、鲁棒且通用的解决方案:一种去噪仿射变换,可在这些挑战性条件下稳定训练过程。通过将量化和稀疏化建模为训练过程中的扰动,我们基于岭回归推导出一种抗扰动的训练方法。该方案采用分段常数主干模型以保证性能下界,并具备内在的噪声抑制机制以减轻扰动导致的性能衰减。此框架使得现有模型能够通过现成方案在任意低精度和稀疏度下进行训练。此外,我们的方法为时序二值神经网络的训练提供了新视角,为推动缩小人工神经网络与生物神经网络之间的差距作出了贡献。