Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is not theoretically well understood. Here, we study the geometry of the PC energy landscape at the inference equilibrium of the network activities. For deep linear networks, we first show that the equilibrated energy is simply a rescaled mean squared error loss with a weight-dependent rescaling. We then prove that many highly degenerate (non-strict) saddles of the loss including the origin become much easier to escape (strict) in the equilibrated energy. Our theory is validated by experiments on both linear and non-linear networks. Based on these and other results, we conjecture that all the saddles of the equilibrated energy are strict. Overall, this work suggests that PC inference makes the loss landscape more benign and robust to vanishing gradients, while also highlighting the fundamental challenge of scaling PC to deeper models.
翻译:预测编码(PC)是一种基于能量的学习算法,其在更新权重前会对网络活动进行迭代推断。近期研究表明,得益于其推断过程,PC能够以比反向传播更少的学习步骤收敛。然而,这些优势并非总能被观察到,且PC推断对学习的影响在理论上尚未得到充分理解。本文研究了网络活动处于推断平衡点时PC能量景观的几何特性。对于深度线性网络,我们首先证明平衡化后的能量仅是均方误差损失的一个经过权重相关缩放的重标度形式。随后,我们证明损失函数中的许多高度退化(非严格)鞍点(包括原点)在平衡化能量中会变得更容易逃离(即变为严格鞍点)。我们的理论在线性和非线性网络的实验中均得到了验证。基于这些及其他结果,我们推测平衡化能量的所有鞍点均为严格鞍点。总体而言,本研究表明PC推断能够使损失景观更为良性且对梯度消失更具鲁棒性,同时也凸显了将PC扩展至更深层模型所面临的根本性挑战。