Prompt-based methods have recently gained prominence in Continual Learning (CL) due to their strong performance and memory efficiency. A prevalent strategy in this paradigm assigns a dedicated subset of prompts to each task, which, while effective, incurs substantial computational overhead and causes memory requirements to scale linearly with the number of tasks. Conversely, approaches employing a single shared prompt across tasks offer greater efficiency but often suffer from degraded performance due to knowledge interference. To reconcile this trade-off, we propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies. Inspired by recent findings on the relationship between Prefix Tuning and Mixture of Experts (MoE), SMoPE organizes a shared prompt into multiple "prompt experts" within a sparse MoE architecture. For each input, only a select subset of relevant experts is activated, effectively mitigating interference. To facilitate expert selection, we introduce a prompt-attention score aggregation mechanism that computes a unified proxy score for each expert, enabling dynamic and sparse activation. Additionally, we propose an adaptive noise mechanism to encourage balanced expert utilization while preserving knowledge from prior tasks. To further enhance expert specialization, we design a prototype-based loss function that leverages prefix keys as implicit memory representations. Extensive experiments across multiple CL benchmarks demonstrate that SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches, all while significantly reducing parameter counts and computational costs.
翻译:基于提示的方法因其卓越的性能和内存效率,最近在持续学习(CL)领域受到广泛关注。该范式中的一种主流策略是为每个任务分配一个专用的提示子集,这种方法虽然有效,但会带来显著的计算开销,并导致内存需求随任务数量线性增长。相反,采用跨任务共享单一提示的方法则效率更高,但由于知识干扰,其性能往往下降。为了调和这一权衡,我们提出了SMoPE,这是一个新颖的框架,它融合了任务特定提示策略和共享提示策略的优势。受最近关于前缀调优与专家混合(MoE)之间关系的研究启发,SMoPE在一个稀疏MoE架构内,将共享提示组织成多个“提示专家”。对于每个输入,仅激活一个选定的相关专家子集,从而有效减轻干扰。为了便于专家选择,我们引入了一种提示注意力分数聚合机制,该机制为每个专家计算一个统一的代理分数,从而实现动态且稀疏的激活。此外,我们提出了一种自适应噪声机制,以鼓励专家利用的平衡,同时保留先前任务的知识。为了进一步增强专家的专业化,我们设计了一种基于原型的损失函数,该函数利用前缀键作为隐式记忆表征。在多个CL基准测试上进行的大量实验表明,SMoPE始终优于任务特定提示方法,并取得了与最先进方法相竞争的性能,同时显著减少了参数数量和计算成本。