Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial. Integrating adapters into neural networks has emerged as a potent technique for transfer learning. This study thoroughly investigates adapter-based ASR adaptation in noisy environments. We conducted experiments using the CHiME--4 dataset. The results show that inserting the adapter in the shallow layer yields superior effectiveness, and there is no significant difference between adapting solely within the shallow layer and adapting across all layers. The simulated data helps the system to improve its performance under real noise conditions. Nonetheless, when the amount of data is the same, the real data is more effective than the simulated data. Multi-condition training is still useful for adapter training. Furthermore, integrating adapters into speech enhancement-based ASR systems yields substantial improvements.
翻译:使自动语音识别系统适配未见过的噪声环境至关重要。将适配器集成到神经网络中已成为迁移学习的一种有效技术。本研究深入探究了噪声环境下基于适配器的ASR适配方法。我们使用CHiME-4数据集进行了实验。结果表明,在浅层插入适配器能取得更优效果,且仅在浅层进行适配与跨全部层适配之间无显著差异。模拟数据有助于提升系统在真实噪声条件下的性能。然而,在数据量相同的情况下,真实数据比模拟数据更有效。多条件训练对适配器训练仍有助益。此外,将适配器集成到基于语音增强的ASR系统中能带来显著性能提升。