Supervised learning in Deep Neural Networks (DNNs) is commonly performed using the error Backpropagation (BP) algorithm. The sequential propagation of errors and the transport of weights during the backward pass limits its efficiency and scalability. Therefore, there is growing interest in finding local alternatives to BP. Recently, methods based on Forward-Mode Automatic Differentiation have been proposed, such as the Forward Gradient algorithm and its variants. However, Forward Gradients suffer from high variance in large DNNs, which affects convergence. In this paper, we address the large variance of Forward Gradients and propose the Forward Direct Feedback Alignment (FDFA) algorithm that combines Activity-Perturbed Forward Gradients with Direct Feedback Alignment and momentum to compute low-variance gradient estimates in DNNs. Our results provides both theoretical proof and empirical evidence that our proposed method achieves lower variance compared to previous Forward Gradient techniques. By reducing the variance of gradient estimates, our approach enables faster convergence and better performance when compared to other local alternatives to backpropagation.
翻译:深度神经网络中的监督学习通常采用误差反向传播算法完成。反向传播过程中误差的顺序传播以及权重的传递限制了其效率和可扩展性。因此,寻找反向传播的局部替代方案备受关注。近年来,基于前向模式自动微分的方法被提出,例如前向梯度算法及其变体。然而,前向梯度在大规模深度神经网络中存在高方差问题,影响收敛性能。本文针对前向梯度的高方差问题,提出前向直接反馈对齐算法,该算法将活动扰动前向梯度与直接反馈对齐及动量方法相结合,以计算深度神经网络中的低方差梯度估计。我们的研究从理论证明和实验证据两方面表明,所提方法相较于此前的前向梯度技术实现了更低的方差。通过降低梯度估计的方差,该方法相较于其他反向传播的局部替代方案可实现更快的收敛速度和更优的性能。