Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial. Integrating adapters into neural networks has emerged as a potent technique for transfer learning. This study thoroughly investigates adapter-based ASR adaptation in noisy environments. We conducted experiments using the CHiME--4 dataset. The results show that inserting the adapter in the shallow layer yields superior effectiveness, and there is no significant difference between adapting solely within the shallow layer and adapting across all layers. The simulated data helps the system to improve its performance under real noise conditions. Nonetheless, when the amount of data is the same, the real data is more effective than the simulated data. Multi-condition training is still useful for adapter training. Furthermore, integrating adapters into speech enhancement-based ASR systems yields substantial improvements.
翻译:使自动语音识别(ASR)系统适应未知噪声环境至关重要。将适配器集成到神经网络中已成为一种有效的迁移学习技术。本研究深入探讨了噪声环境下基于适配器的ASR自适应方法。我们使用CHiME-4数据集进行了实验。结果表明,在浅层插入适配器能产生更优的效果,且仅在浅层进行自适应与在所有层进行自适应之间无显著差异。模拟数据有助于系统提升其在真实噪声条件下的性能。然而,当数据量相同时,真实数据比模拟数据更有效。多条件训练对于适配器训练仍然是有益的。此外,将适配器集成到基于语音增强的ASR系统中能带来显著性能提升。