Continual learning for end-to-end automatic speech recognition has to contend with a number of difficulties. Fine-tuning strategies tend to lose performance on data already seen, a process known as catastrophic forgetting. On the other hand, strategies that freeze parameters and append tunable parameters must maintain multiple models. We suggest a strategy that maintains only a single model for inference and avoids catastrophic forgetting. Our experiments show that a simple linear interpolation of several models' parameters, each fine-tuned from the same generalist model, results in a single model that performs well on all tested data. For our experiments we selected two open-source end-to-end speech recognition models pre-trained on large datasets and fine-tuned them on 3 separate datasets: SGPISpeech, CORAAL, and DiPCo. The proposed average of domain experts model performs well on all tested data, and has almost no loss in performance on data from the domain of original training.
翻译:端到端自动语音识别的持续学习面临诸多挑战。微调策略往往会在已见过数据上损失性能,这一现象被称为灾难性遗忘。另一方面,冻结参数并附加可调参数的策略需要维护多个模型。我们提出了一种新策略,仅需维护单个推理模型即可避免灾难性遗忘。实验表明,对从同一通用模型微调得到的多个模型参数进行简单线性插值,即可获得在所有测试数据上表现优异的单一模型。我们选取了两个在大规模数据集上预训练的端到端语音识别开源模型,并在三个独立数据集(SGPISpeech、CORAAL、DiPCo)上对其进行微调。提出的领域专家平均模型在所有测试数据上均表现良好,且在原始训练领域数据上的性能几乎无损失。