Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Tangent Kernel, and identify dynamical isometry (the condition that layer-wise Jacobian singular values remain close to one) as a key mechanism for preserving plasticity in continual learning. We revisit a class of networks that are almost-everywhere isometric while remaining universal Lipschitz function approximators, demonstrating that near-dynamical isometry is compatible with expressive nonlinear representations. For general architectures, we propose an efficient isometry-promoting regularization scheme and identify a novel mechanism by which it can reactivate dormant ReLU units. Building on this, we introduce AdamO, an Adam-style adaptive optimizer that decouples isometry regularization from gradient updates, analogous to AdamW. We further reinterpret prior plasticity-preserving approaches through the lens of dynamical isometry, showing that they target only a partial measure of isometry. Across supervised and reinforcement-learning continual-learning benchmarks designed to induce plasticity loss, our methods consistently match or outperform existing approaches.
翻译:深度神经网络在非平稳条件下的持续训练常导致可塑性逐渐丧失,最终限制进一步学习。我们将可塑性与经验神经正切核相关联,并识别出动态等距(即逐层雅可比奇异值保持接近1的条件)是持续学习中保持可塑性的关键机制。我们重新审视了一类几乎处处等距同时保持通用Lipschitz函数逼近能力的网络,证明近动态等距与具有表达能力的非线性表示兼容。针对通用架构,我们提出了一种高效促进等距的正则化方案,并识别出该方案可重新激活休眠ReLU单元的全新机理。在此基础上,我们引入AdamO——一种将等距正则化与梯度更新解耦的Adam风格自适应优化器(类似于AdamW)。我们进一步通过动态等距视角重新阐释了先前保持可塑性的方法,表明它们仅针对等距的部分度量。在针对可塑性损失设计的监督学习和强化学习持续学习基准测试中,我们的方法始终达到或超越现有方法的性能。