In independent, identically distributed (i.i.d.) training regimes, activation functions have been benchmarked extensively, and their differences often shrink once model size and optimization are tuned. In continual learning, however, the picture is different: beyond catastrophic forgetting, models can progressively lose the ability to adapt (referred to as loss of plasticity) and the role of the non-linearity in this failure mode remains underexplored. We show that activation choice is a primary, architecture-agnostic lever for mitigating plasticity loss. Building on a property-level analysis of negative-branch shape and saturation behavior, we introduce two drop-in nonlinearities (Smooth-Leaky and Randomized Smooth-Leaky) and evaluate them in two complementary settings: (i) supervised class-incremental benchmarks and (ii) reinforcement learning with non-stationary MuJoCo environments designed to induce controlled distribution and dynamics shifts. We also provide a simple stress protocol and diagnostics that link the shape of the activation to the adaptation under change. The takeaway is straightforward: thoughtful activation design offers a lightweight, domain-general way to sustain plasticity in continual learning without extra capacity or task-specific tuning.
翻译:在独立同分布(i.i.d.)的训练机制中,激活函数已被广泛基准测试,一旦模型规模和优化得到调整,它们之间的差异通常会缩小。然而,在持续学习中,情况则有所不同:除了灾难性遗忘之外,模型可能会逐渐丧失适应能力(称为可塑性丧失),而非线性在这一失效模式中的作用仍未得到充分探索。我们证明,激活函数的选择是缓解可塑性丧失的一个主要的、与架构无关的杠杆。基于对负分支形状和饱和行为的特性分析,我们引入了两种即插即用的非线性函数(Smooth-Leaky 和 Randomized Smooth-Leaky),并在两种互补的设置中对其进行了评估:(i) 监督式类增量基准测试,以及 (ii) 在非平稳的 MuJoCo 环境中进行强化学习,这些环境旨在引发受控的分布和动态变化。我们还提供了一个简单的压力测试协议和诊断方法,将激活函数的形状与变化下的适应能力联系起来。结论很直接:经过深思熟虑的激活函数设计提供了一种轻量级、领域通用的方法,可以在持续学习中维持可塑性,而无需额外的容量或针对特定任务的调整。