Automatic Speech Recognition (ASR) traditionally assumes known domains, but adding data from a new domain raises concerns about computational inefficiencies linked to retraining models on both existing and new domains. Fine-tuning solely on new domain risks Catastrophic Forgetting (CF). To address this, Lifelong Learning (LLL) algorithms have been proposed for ASR. Prior research has explored techniques such as Elastic Weight Consolidation, Knowledge Distillation, and Replay, all of which necessitate either additional parameters or access to prior domain data. We propose Sequential Model Editing as a novel method to continually learn new domains in ASR systems. Different than previous methods, our approach does not necessitate access to prior datasets or the introduction of extra parameters. Our study demonstrates up to 15% Word Error Rate Reduction (WERR) over fine-tuning baseline, and superior efficiency over other LLL techniques on CommonVoice English multi-accent dataset.
翻译:传统的自动语音识别(ASR)通常假设已知领域,但添加新领域数据会引发对计算效率的担忧,因为需要在现有领域和新领域上重新训练模型。仅在新领域上进行微调可能引发灾难性遗忘(CF)。为解决这一问题,已有研究提出了用于ASR的终身学习(LLL)算法。先前研究探索了弹性权重巩固、知识蒸馏和重放等技术,这些方法均需要额外参数或访问先前领域数据。我们提出序列模型编辑作为一种新方法,用于在ASR系统中持续学习新领域。与先前方法不同,我们的方法无需访问先前数据集或引入额外参数。我们的研究表明,在CommonVoice英语多口音数据集上,相较于微调基线实现了高达15%的词错误率降低(WERR),并且比其他LLL技术具有更高的效率。