Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.
翻译:前向梯度下降(FGD)因其无需反向传播的计算特性,被提出作为梯度下降的一种生物学上更合理的替代方案。考虑具有 $d$ 个参数的线性模型,先前研究发现 FGD 的预测误差比随机梯度下降(SGD)的预测误差慢 $d$ 倍。本文证明,若基于每个训练样本计算 $\ell$ 步 FGD 更新,则该次优因子将变为 $d/(\ell \wedge d)$,因此当 $\ell \gtrsim d$ 时速率次优性将消失。我们还证明,重复采样的 FGD 能够自适应输入分布中的低维结构。主要的数学挑战在于控制由重复采样过程产生的依赖关系。