Non-Equilibrium Stochastic Dynamics as a Unified Framework for Insight and Repetitive Learning: A Kramers Escape Approach to Continual Learning

Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical account of why plasticity eventually collapses as tasks accumulate. Separately, the distinction between sudden insight and gradual skill acquisition through repetitive practice has lacked a unified theoretical description. Here, we show that both problems admit a common resolution within non-equilibrium statistical physics. We model the state of a learning system as a particle evolving under Langevin dynamics on a double-well energy landscape, with the noise amplitude governed by a time-dependent effective temperature $T(t)$. The probability density obeys a Fokker--Planck equation, and transitions between metastable states are governed by the Kramers escape rate $k = (ω_0ω_b/2π)\,e^{-ΔE/T}$. We make two contributions. First, we identify the EWC penalty term as an energy barrier whose height grows linearly with the number of accumulated tasks, yielding an exponential collapse of the transition rate predicted analytically and confirmed numerically. Second, we show that insight and repetitive learning correspond to two qualitatively distinct temperature protocols within the same Fokker--Planck equation: insight events produce transient spikes in $T(t)$ that drive rapid barrier crossing, whereas repetitive practice operates at a modestly elevated but fixed temperature, achieving transitions through sustained stochastic diffusion. These results establish a physically grounded framework for understanding plasticity and its failure in continual learning systems, and suggest principled design criteria for adaptive noise schedules in artificial intelligence.

翻译：人工神经网络中的持续学习从根本上受限于稳定性-可塑性困境：保留先前知识的系统往往抗拒获取新知识，反之亦然。现有方法（最显著的是弹性权重巩固（EWC）虽在经验上解决了这一问题，但缺乏对任务累积后可塑性为何最终崩溃的物理学解释。此外，顿悟与通过重复练习逐步习得技能之间的区别，一直缺乏统一的理论描述。本文证明这两个问题在非平衡统计物理学中具有共同的解决方案。我们将学习系统的状态建模为在双阱能量景观上遵循朗之万动力学演化的粒子，其噪声幅度由随时间变化的有效温度$T(t)$控制。概率密度满足福克-普朗克方程，亚稳态之间的跃迁由克莱默斯逃逸速率$k = (ω_0ω_b/2π)\,e^{-ΔE/T}$主导。我们做出两项贡献：首先，识别出EWC惩罚项为能量势垒，其高度随累积任务数量线性增长，从而在解析预测和数值验证中呈现出跃迁速率的指数级崩溃。其次，我们证明洞察与重复学习对应于同一福克-普朗克方程中两种性质截然不同的温度协议：洞察事件在$T(t)$中产生瞬时尖峰，驱动快速势垒穿越；而重复练习则在适度升高但恒定的温度下运行，通过持续随机扩散实现跃迁。这些结果为理解持续学习系统中的可塑性及其失效建立了物理基础框架，并为人工智能中自适应噪声调度提出了基于原理的设计准则。