In this work we introduce methods to reduce the computational and memory costs of training deep neural networks. Our approach consists in replacing exact vector-jacobian products by randomized, unbiased approximations thereof during backpropagation. We provide a theoretical analysis of the trade-off between the number of epochs needed to achieve a target precision and the cost reduction for each epoch. We then identify specific unbiased estimates of vector-jacobian products for which we establish desirable optimality properties of minimal variance under sparsity constraints. Finally we provide in-depth experiments on multi-layer perceptrons, BagNets and Visual Transfomers architectures. These validate our theoretical results, and confirm the potential of our proposed unbiased randomized backpropagation approach for reducing the cost of deep learning.
翻译:本研究提出降低深度神经网络训练计算与内存成本的方法。我们的方法核心在于反向传播过程中,用随机化、无偏的近似计算替代精确的向量-雅可比乘积运算。我们通过理论分析揭示了达到目标精度所需训练轮次与每轮计算成本缩减之间的权衡关系。随后,我们针对特定无偏向量-雅可比积估计量,在稀疏性约束条件下建立了最小方差的最优性理论。最后,我们在多层感知机、BagNets及Visual Transformer架构上进行了深入实验。实验结果验证了理论分析的正确性,并证实了我们提出的无偏随机化反向传播方法在降低深度学习成本方面的潜力。