Activation Function Design Sustains Plasticity in Continual Learning

In independent, identically distributed (i.i.d.) training regimes, activation functions have been benchmarked extensively, and their differences often shrink once model size and optimization are tuned. In continual learning, however, the picture is different: beyond catastrophic forgetting, models can progressively lose the ability to adapt (referred to as loss of plasticity) and the role of the non-linearity in this failure mode remains underexplored. We show that activation choice is a primary, architecture-agnostic lever for mitigating plasticity loss. Building on a property-level analysis of negative-branch shape and saturation behavior, we introduce two drop-in nonlinearities (Smooth-Leaky and Randomized Smooth-Leaky) and evaluate them in two complementary settings: (i) supervised class-incremental benchmarks and (ii) reinforcement learning with non-stationary MuJoCo environments designed to induce controlled distribution and dynamics shifts. We also provide a simple stress protocol and diagnostics that link the shape of the activation to the adaptation under change. The takeaway is straightforward: thoughtful activation design offers a lightweight, domain-general way to sustain plasticity in continual learning without extra capacity or task-specific tuning.

翻译：在独立同分布（i.i.d.）的训练机制中，激活函数已被广泛基准测试，一旦模型规模和优化得到调整，它们之间的差异通常会缩小。然而，在持续学习中，情况则有所不同：除了灾难性遗忘之外，模型可能会逐渐丧失适应能力（称为可塑性丧失），而非线性在这一失效模式中的作用仍未得到充分探索。我们证明，激活函数的选择是缓解可塑性丧失的一个主要的、与架构无关的杠杆。基于对负分支形状和饱和行为的特性分析，我们引入了两种即插即用的非线性函数（Smooth-Leaky 和 Randomized Smooth-Leaky），并在两种互补的设置中对其进行了评估：(i) 监督式类增量基准测试，以及 (ii) 在非平稳的 MuJoCo 环境中进行强化学习，这些环境旨在引发受控的分布和动态变化。我们还提供了一个简单的压力测试协议和诊断方法，将激活函数的形状与变化下的适应能力联系起来。结论很直接：经过深思熟虑的激活函数设计提供了一种轻量级、领域通用的方法，可以在持续学习中维持可塑性，而无需额外的容量或针对特定任务的调整。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

【博士论文】强化学习智能体的奖励函数设计

专知会员服务

48+阅读 · 2025年4月8日

《图持续学习》综述

专知会员服务

44+阅读 · 2024年2月13日

【综述】持续学习与预训练模型综述

专知会员服务

54+阅读 · 2024年1月30日