Preserving Plasticity in Continual Learning via Dynamical Isometry

Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Tangent Kernel, and identify dynamical isometry (the condition that layer-wise Jacobian singular values remain close to one) as a key mechanism for preserving plasticity in continual learning. We revisit a class of networks that are almost-everywhere isometric while remaining universal Lipschitz function approximators, demonstrating that near-dynamical isometry is compatible with expressive nonlinear representations. For general architectures, we propose an efficient isometry-promoting regularization scheme and identify a novel mechanism by which it can reactivate dormant ReLU units. Building on this, we introduce AdamO, an Adam-style adaptive optimizer that decouples isometry regularization from gradient updates, analogous to AdamW. We further reinterpret prior plasticity-preserving approaches through the lens of dynamical isometry, showing that they target only a partial measure of isometry. Across supervised and reinforcement-learning continual-learning benchmarks designed to induce plasticity loss, our methods consistently match or outperform existing approaches.

翻译：深度神经网络在非平稳条件下的持续训练常导致可塑性逐渐丧失，最终限制进一步学习。我们将可塑性与经验神经正切核相关联，并识别出动态等距（即逐层雅可比奇异值保持接近1的条件）是持续学习中保持可塑性的关键机制。我们重新审视了一类几乎处处等距同时保持通用Lipschitz函数逼近能力的网络，证明近动态等距与具有表达能力的非线性表示兼容。针对通用架构，我们提出了一种高效促进等距的正则化方案，并识别出该方案可重新激活休眠ReLU单元的全新机理。在此基础上，我们引入AdamO——一种将等距正则化与梯度更新解耦的Adam风格自适应优化器（类似于AdamW）。我们进一步通过动态等距视角重新阐释了先前保持可塑性的方法，表明它们仅针对等距的部分度量。在针对可塑性损失设计的监督学习和强化学习持续学习基准测试中，我们的方法始终达到或超越现有方法的性能。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

深度强化学习中的可塑性损失：综述

专知会员服务

22+阅读 · 2024年11月8日

【普林斯顿博士论文】深度学习优化的隐性偏差：数学考察，391页pdf

专知会员服务

29+阅读 · 2024年10月4日

持续学习的研究进展与趋势

专知会员服务

46+阅读 · 2024年3月8日

《图持续学习》综述

专知会员服务

45+阅读 · 2024年2月13日