Gradient-based algorithms are a cornerstone of artificial neural network training, yet it remains unclear whether biological neural networks use similar gradient-based strategies during learning. Experiments often discover a diversity of synaptic plasticity rules, but whether these amount to an approximation to gradient descent is unclear. Here we investigate a previously overlooked possibility: that learning dynamics may include fundamentally non-gradient "curl"-like components while still being able to effectively optimize a loss function. Curl terms naturally emerge in networks with inhibitory-excitatory connectivity or Hebbian/anti-Hebbian plasticity, resulting in learning dynamics that cannot be framed as gradient descent on any objective. To investigate the impact of these curl terms, we analyze feedforward networks within an analytically tractable student-teacher framework, systematically introducing non-gradient dynamics through neurons exhibiting rule-flipped plasticity. Small curl terms preserve the stability of the original solution manifold, resulting in learning dynamics similar to gradient descent. Beyond a critical value, strong curl terms destabilize the solution manifold. Depending on the network architecture, this loss of stability can lead to chaotic learning dynamics that destroy performance. In other cases, the curl terms can counterintuitively speed learning compared to gradient descent by allowing the weight dynamics to escape saddles by temporarily ascending the loss. Our results identify specific architectures capable of supporting robust learning via diverse learning rules, providing an important counterpoint to normative theories of gradient-based learning in neural networks.
翻译:基于梯度的算法是人工神经网络训练的基石,但生物神经网络在学习过程中是否采用类似的梯度策略仍不清楚。实验常发现突触可塑性规则具有多样性,但这些规则是否构成对梯度下降的近似尚不明确。本研究探讨了一个此前被忽视的可能性:学习动力学可能包含根本性的非梯度“卷曲”分量,同时仍能有效优化损失函数。卷曲项在具有抑制-兴奋连接或赫布/反赫布可塑性的网络中自然涌现,导致无法被归为任何目标函数的梯度下降的学习动力学。为探究这些卷曲项的影响,我们在可解析处理的"学生-教师"框架内分析前馈网络,通过引入具有规则翻转可塑性的神经元系统性地构建非梯度动力学。小尺度卷曲项保持原始解流形的稳定性,产生类似梯度下降的学习动力学。超过临界值时,强卷曲项会破坏解流形的稳定性。根据网络架构的不同,这种稳定性丧失可能导致破坏性能的混沌学习动力学。而在其他情形下,卷曲项能通过允许权重动力学暂时沿损失上升方向逃离鞍点,反直觉地比梯度下降更快加速学习。我们的研究识别出能够通过多样化学习规则支持鲁棒学习的特定架构,为神经网络中基于梯度学习的规范理论提供了重要反例。