Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models

In this paper, we investigate domain adaptation for low-resource Automatic Speech Recognition (ASR) of target-domain data, when a well-trained ASR model trained with a large dataset is available. We argue that in the encoder-decoder framework, the decoder of the well-trained ASR model is largely tuned towards the source-domain, hurting the performance of target-domain models in vanilla transfer-learning. On the other hand, the encoder layers of the well-trained ASR model mostly capture the acoustic characteristics. We, therefore, propose to use the embeddings tapped from these encoder layers as features for a downstream Conformer target-domain model and show that they provide significant improvements. We do ablation studies on which encoder layer is optimal to tap the embeddings, as well as the effect of freezing or updating the well-trained ASR model's encoder layers. We further show that applying Spectral Augmentation (SpecAug) on the proposed features (this is in addition to default SpecAug on input spectral features) provides a further improvement on the target-domain performance. For the LibriSpeech-100-clean data as target-domain and SPGI-5000 as a well-trained model, we get 30% relative improvement over baseline. Similarly, with WSJ data as target-domain and LibriSpeech-960 as a well-trained model, we get 50% relative improvement over baseline.

翻译：本文研究在拥有使用大规模数据集训练良好的自动语音识别（ASR）模型时，针对目标域数据的低资源域自适应问题。我们论证，在编码器-解码器框架中，训练良好的ASR模型的解码器在很大程度上倾向于源域，从而损害了标准迁移学习中目标域模型的性能。另一方面，训练良好的ASR模型的编码器层主要捕捉声学特征。因此，我们提出将这些编码器层中提取的嵌入特征用作下游Conformer目标域模型的输入，并证明它们能带来显著改进。我们进行了消融研究，探讨了提取嵌入的最佳编码器层，以及冻结或更新训练良好的ASR模型编码器层的影响。此外，我们还证明，对提出的特征应用频谱增强（SpecAug）（这是对输入频谱特征默认SpecAug的补充）能进一步提升目标域性能。以LibriSpeech-100-clean数据作为目标域、SPGI-5000作为训练良好的模型时，我们相对于基线获得了30%的相对改进。类似地，以WSJ数据作为目标域、LibriSpeech-960作为训练良好的模型时，我们获得了50%的相对改进。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/