Predictive coding (PC) is a biologically plausible alternative to standard backpropagation (BP) that minimises an energy function with respect to network activities before updating weights. Recent work has improved the training stability of deep PC networks (PCNs) by leveraging some BP-inspired reparameterisations. However, the full scalability and theoretical basis of these approaches remains unclear. To address this, we study the infinite width and depth limits of PCNs. For linear residual networks, we show that the set of width- and depth-stable feature-learning parameterisations for PC is exactly the same as for BP. Moreover, under any of these parameterisations, the PC energy with equilibrated activities converges to the BP loss in a regime where the model width is much larger than the depth, resulting in PC computing the same gradients as BP. Experiments show that these results hold in practice for deep nonlinear networks, as long as an activity equilibrium seem to be reached. Overall, this work unifies various previous theoretical and empirical results and has potentially important implications for the scaling of PCNs.
翻译:预测编码(PC)是一种具有生物合理性的标准反向传播(BP)替代方案,其通过在更新权重前最小化网络活动相关的能量函数来实现学习。近期研究通过借鉴BP启发的重参数化方法,提升了深度预测编码网络(PCN)的训练稳定性。然而,这些方法的完全可扩展性与理论基础仍不明确。为此,我们研究了PCN在无限宽度与深度条件下的极限行为。对于线性残差网络,我们证明PC中能够实现宽度与深度稳定的特征学习参数化集合与BP完全相同。此外,在任意此类参数化下,当模型宽度远大于深度时,具有平衡活动的PC能量函数收敛于BP损失函数,使得PC计算的梯度与BP一致。实验表明,只要能够达到活动平衡状态,这些结论在实际深度非线性网络中同样成立。总体而言,本研究统一了先前多种理论与实证结果,并对PCN的规模化发展具有潜在重要意义。