Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.
翻译:前向梯度——在正向微分模式中使用方向导数的概念——近期已被证明可用于神经网络训练,同时避免了反向传播梯度计算通常带来的问题,如锁定和记忆需求。其代价是需要猜测步进方向,这在高维空间中非常困难。虽然当前解决方案依赖于各向同性猜测向量分布的加权平均,但我们提出将梯度猜测强烈偏向于更有前景的方向,例如从小型局部辅助网络获得的反馈。针对标准计算机视觉神经网络,我们开展了一项严谨研究,系统性地涵盖了梯度目标与梯度猜测的各种组合,包括先前文献中提出的方案。我们发现,将局部损失获得的梯度作为候选方向,能显著提升前向梯度方法中随机噪声的效果。