When there is a mismatch between the training and test domains, current speech recognition systems show significant performance degradation. Self-training methods, such as noisy student teacher training, can help address this and enable the adaptation of models under such domain shifts. However, self-training typically requires a collection of unlabelled target domain data. For settings where this is not practical, we investigate the benefit of performing noisy student teacher training on recordings in the test set as a test-time adaptation approach. Similarly to the dynamic evaluation approach in language modelling, this enables the transfer of information across utterance boundaries and functions as a method of domain adaptation. A range of in-domain and out-of-domain datasets are used for experiments demonstrating large relative gains of up to 32.2%. Interestingly, our method showed larger gains than the typical self-training setup that utilises separate adaptation data.
翻译:当训练与测试领域存在不匹配时,当前语音识别系统会出现显著的性能下降。自训练方法(如噪声学生-教师训练)有助于解决此问题,使模型能够在此类领域偏移下进行自适应。然而,自训练通常需要收集目标领域的未标注数据。针对不满足此条件的场景,我们研究了在测试集录音上实施噪声学生-教师训练作为测试时自适应方法的效益。类似于语言建模中的动态评估方法,这使得信息能够在话语边界间传递,并作为一种领域自适应手段发挥作用。实验使用了一系列领域内和领域外数据集,结果显示相对增益最高可达32.2%。值得注意的是,本方法比使用独立自适应数据的典型自训练方案表现出更大的性能提升。