Warm-starting neural network training by initializing networks with previously learned weights is appealing, as practical neural networks are often deployed under a continuous influx of new data. However, it often leads to loss of plasticity, where the network loses its ability to learn new information, resulting in worse generalization than training from scratch. This occurs even under stationary data distributions, and its underlying mechanism is poorly understood. We develop a framework emulating real-world neural network training and identify noise memorization as the primary cause of plasticity loss when warm-starting on stationary data. Motivated by this, we propose Direction-Aware SHrinking (DASH), a method aiming to mitigate plasticity loss by selectively forgetting memorized noise while preserving learned features. We validate our approach on vision tasks, demonstrating improvements in test accuracy and training efficiency.
翻译:通过使用先前学习到的权重初始化网络来热启动神经网络训练具有吸引力,因为实际部署的神经网络常常面临持续的新数据流入。然而,这通常会导致可塑性丧失,即网络失去学习新信息的能力,从而产生比从头开始训练更差的泛化性能。即使在平稳数据分布下,这种现象也会发生,且其根本机制尚不明确。我们开发了一个模拟现实世界神经网络训练的框架,并识别出噪声记忆化是导致在平稳数据上热启动时丧失可塑性的主要原因。受此启发,我们提出了方向感知收缩(Direction-Aware SHrinking, DASH)方法,旨在通过选择性遗忘已记忆的噪声同时保留已学习的特征,来减轻可塑性丧失。我们在视觉任务上验证了我们的方法,证明了其在测试准确性和训练效率方面的提升。