Backpropagation as Physical Relaxation: Exact Gradients in Finite Time

Backpropagation, the foundational algorithm for training neural networks, is typically understood as a symbolic computation that recursively applies the chain rule. We show it emerges exactly as the finite-time relaxation of a physical dynamical system. By formulating feedforward inference as a continuous-time process and applying Lagrangian theory of non-conservative systems to handle asymmetric interactions, we derive a global energy functional on a doubled state space encoding both activations and sensitivities. The saddle-point dynamics of this energy perform inference and credit assignment simultaneously through local interactions. We term this framework ''Dyadic Backpropagation''. Crucially, we prove that unit-step Euler discretization, the natural timescale of layer transitions, recovers standard backpropagation exactly in precisely 2L steps for an L-layer network, with no approximations. Unlike prior energy-based methods requiring symmetric weights, asymptotic convergence, or vanishing perturbations, our framework guarantees exact gradients in finite time. This establishes backpropagation as the digitally optimized shadow of a continuous physical relaxation, providing a rigorous foundation for exact gradient computation in analog and neuromorphic substrates where continuous dynamics are native.

翻译：反向传播作为训练神经网络的基础算法，通常被理解为递归应用链式法则的符号计算。我们证明其精确地呈现为物理动力系统在有限时间内的弛豫过程。通过将前馈推理表述为连续时间过程，并应用非保守系统的拉格朗日理论处理非对称相互作用，我们在编码激活值与敏感度的双重状态空间上推导出全局能量泛函。该能量泛函的鞍点动力学通过局部相互作用同时执行推理与信用分配。我们将此框架称为“二元反向传播”。关键的是，我们证明了单位步长欧拉离散化——即层间转换的自然时间尺度——能在L层网络中精确地以2L步恢复标准反向传播，且无需任何近似。与先前需要对称权重、渐近收敛或微扰趋零的基于能量的方法不同，我们的框架保证了有限时间内获得精确梯度。这确立了反向传播作为连续物理弛豫过程的数字化优化投影，为在连续动力学本征的模拟与神经形态基底中实现精确梯度计算奠定了严格理论基础。

相关内容

反向传播

关注 354

反向传播一词严格来说仅指用于计算梯度的算法，而不是指如何使用梯度。但是该术语通常被宽松地指整个学习算法，包括如何使用梯度，例如通过随机梯度下降。反向传播将增量计算概括为增量规则中的增量规则，该规则是反向传播的单层版本，然后通过自动微分进行广义化，其中反向传播是反向累积（或“反向模式”）的特例。在机器学习中，反向传播（backprop）是一种广泛用于训练前馈神经网络以进行监督学习的算法。对于其他人工神经网络（ANN）都存在反向传播的一般化–一类算法，通常称为“反向传播”。反向传播算法的工作原理是，通过链规则计算损失函数相对于每个权重的梯度，一次计算一层，从最后一层开始向后迭代，以避免链规则中中间项的冗余计算。

【博士论文】面向可扩展深度神经网络的预测编码：理论与实践

专知会员服务

15+阅读 · 2025年11月7日

【NeurIPS2023】视觉Transformer自适应的高效低秩反向传播算法

专知会员服务

23+阅读 · 2023年9月30日

【ETH博士论文】维数灾难与神经网络的基于梯度训练：缩小理论与应用之间的鸿沟，123页pdf

专知会员服务

35+阅读 · 2023年5月31日

Nature. Mach. Intell. |基于梯度的学习通过平衡压缩和扩展来驱动循环神经网络中的鲁棒表示

专知会员服务

10+阅读 · 2022年6月23日