Predictive coding (PC) is a biologically plausible alternative to standard backpropagation (BP) that minimises an energy function with respect to network activities before updating weights. Recent work has improved the training stability of deep PC networks (PCNs) by leveraging some BP-inspired reparameterisations. However, the full scalability and theoretical basis of these methods remain unclear. To address this gap, we study the infinite width and depth limits of PCNs. For linear residual networks, we show that the set of width- and depth-stable feature-learning parameterisations for PC is exactly the same as for BP. Moreover, under any of these parameterisations, the PC energy with equilibrated activities converges to the quadratic BP loss when the model width is much larger than the depth, resulting in PC computing the same gradients as BP. Experiments show that, as long as an activity equilibrium is reached, convergence to BP holds for nonlinear models including convolutional networks and transformers. Overall, this work constrains the types of parameterisation that are scalable with PC, while showing a way in which BP can be effectively implemented with only local updates in much wider than deep networks like the brain.
翻译:预测编码(PC)是标准反向传播(BP)的一种生物合理性替代方案,其先通过最小化基于网络活动的能量函数来更新权重。近期研究借助一些受BP启发的重参数化方法,提升了深度PC网络(PCN)的训练稳定性。然而,这些方法的完全可扩展性及理论基础仍不清晰。为填补这一空白,我们研究了PCN的无限宽度与深度极限。对于线性残差网络,我们证明PC实现宽度与深度稳定特征学习参数化的条件与BP完全相同。此外,在任意此类参数化下,当模型宽度远大于深度时,具有均衡化活动的PC能量会收敛至二次BP损失,使得PC能够计算与BP相同的梯度。实验表明,只要达到活动均衡,非线性模型(包括卷积网络和Transformer)均会收敛至BP。总体而言,本研究界定了PC可扩展的参数化类型,同时揭示了在宽度远大于深度的网络(如大脑)中,仅通过局部更新即可有效实现BP的途径。