The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically implausible. Forward gradients are an approach to approximate the gradients from directional derivatives along random tangents computed by forward-mode automatic differentiation. So far, research has focused on using a single tangent per step. This paper provides an in-depth analysis of multi-tangent forward gradients and introduces an improved approach to combining the forward gradients from multiple tangents based on orthogonal projections. We demonstrate that increasing the number of tangents improves both approximation quality and optimization performance across various tasks.
翻译:训练神经网络所用的梯度通常通过反向传播计算。尽管反向传播是获取精确梯度的有效方法,但其计算成本高昂、阻碍并行化且缺乏生物合理性。前向梯度是一种通过前向模式自动微分计算随机切向导数来近似梯度的方法。迄今为止,研究主要集中于每步使用单一切向量。本文深入分析了多切向前向梯度,并提出一种基于正交投影的改进方法,用于融合多个切向量的前向梯度。我们通过多项任务证明,增加切向量数量既能提升梯度近似质量,也能改善优化性能。