Reducing noise interference is crucial for automatic speech recognition (ASR) in a real-world scenario. However, most single-channel speech enhancement (SE) generates "processing artifacts" that negatively affect ASR performance. Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective. NAaLoss considers the loss of estimation, de-artifact, and noise ignorance, enabling the learned SE to individually model speech, artifacts, and noise. We examine two SE models (simple/advanced) learned with NAaLoss under various input scenarios (clean/noisy) using two configurations of the ASR system (with/without noise robustness). Experiments reveal that NAaLoss significantly improves the ASR performance of most setups while preserving the quality of SE toward perception and intelligibility. Furthermore, we visualize artifacts through waveforms and spectrograms, and explain their impact on ASR.
翻译:降低噪声干扰对现实场景中的自动语音识别(ASR)至关重要。然而,大多数单通道语音增强(SE)会产生“处理伪影”,从而对ASR性能产生负面影响。因此,在本研究中,我们提出了一种噪声与伪影感知损失函数(NAaLoss),从新的角度改善伪影的影响。NAaLoss综合考虑了估计损失、去伪影和噪声忽略,使学习到的SE能够分别对语音、伪影和噪声进行建模。我们使用两种ASR系统配置(具有/不具有噪声鲁棒性),在多种输入场景(干净/含噪)下,检验了通过NAaLoss学习的两种SE模型(简单/高级)。实验表明,NAaLoss在保持SE对感知和清晰度的质量的同时,显著提高了大多数设置下的ASR性能。此外,我们通过波形图和频谱图对伪影进行可视化,并解释其对ASR的影响。