Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which enhances model performance by leveraging output prediction entropy minimization as a self-supervision signal. However, a key limitation of this self-supervision lies in its primary focus on acoustic features, with minimal attention to the linguistic properties of the input. To address this gap, we propose Language Informed Test-Time Adaptation (LI-TTA), which incorporates linguistic insights during TTA for ASR. LI-TTA integrates corrections from an external language model to merge linguistic with acoustic information by minimizing the CTC loss from the correction alongside the standard TTA loss. With extensive experiments, we show that LI-TTA effectively improves the performance of TTA for ASR in various distribution shift situations.
翻译:测试时适应已成为解决目标环境与原始训练环境存在差异的域偏移挑战的关键方案。其典型应用即面向自动语音识别系统的测试时适应,该方法通过利用输出预测熵最小化作为自监督信号来提升模型性能。然而,这种自监督机制的核心局限在于其主要关注声学特征,而对输入的语言学特性关注不足。为弥补这一缺陷,我们提出语言感知测试时适应方法,该方法在ASR的测试时适应过程中融入语言学认知。LI-TTA通过引入外部语言模型的纠错结果,在最小化标准TTA损失的同时,最小化纠错产生的CTC损失,从而实现语言学信息与声学信息的融合。大量实验表明,LI-TTA能在多种分布偏移场景下有效提升ASR测试时适应的性能。