Continual Learning (CL) in Automatic Speech Recognition (ASR) suffers from catastrophic forgetting when adapting to new tasks, domains, or speakers. A common strategy to mitigate this is to store a subset of past data in memory for rehearsal. However, rehearsal-based methods face key limitations: storing data is often costly, infeasible with pre-trained models, or restricted by privacy regulations. Running existing rehearsal-based methods with smaller memory sizes to alleviate these issues usually leads to degraded performance. We propose a rehearsal-based CL method that remains effective even with minimal memory. It operates in two stages: first, fine-tuning on the new task; second, applying Singular Value Decomposition (SVD) to the changes in linear layers and, in a parameter-efficient manner, retraining only gating vectors on the singular values, which control to extent to which updates from the first stage are accepted, using rehearsal. We extensively test and analyze our method on two monolingual and two multilingual benchmarks. Our method reduces forgetting and outperforms state-of-the-art CL approaches for ASR, even when limited to a single utterance per previous task.
翻译:自动语音识别(ASR)中的持续学习(CL)在适应新任务、新领域或新说话人时,常遭受灾难性遗忘问题。一种常见的缓解策略是将部分历史数据存储在记忆库中用于回放训练。然而,基于回放的方法面临关键限制:存储数据通常成本高昂,对于预训练模型不可行,或受隐私法规限制。若为缓解这些问题而采用更小的记忆库运行现有回放方法,通常会导致性能下降。我们提出一种基于回放的持续学习方法,即使在极小记忆容量下仍能保持有效性。该方法分两个阶段运行:首先,在新任务上进行微调;其次,对线性层参数的变化进行奇异值分解(SVD),并以参数高效的方式,仅对控制第一阶段更新接受程度的奇异值对应的门控向量进行回放重训练。我们在两个单语和两个多语言基准测试上对该方法进行了全面测试与分析。实验表明,即使每个历史任务仅限存储一条语音样本,我们的方法仍能有效减少遗忘,并超越当前最先进的ASR持续学习方法。