Developing robust automatic speech recognition (ASR) systems for Arabic, a language characterized by its rich dialectal diversity and often considered a low-resource language in speech technology, demands effective strategies to manage its complexity. This study explores three critical factors influencing ASR performance: the role of dialectal coverage in pre-training, the effectiveness of dialect-specific fine-tuning compared to a multi-dialectal approach, and the ability to generalize to unseen dialects. Through extensive experiments across different dialect combinations, our findings offer key insights towards advancing the development of ASR systems for pluricentric languages like Arabic.
翻译:阿拉伯语以其丰富的方言多样性为特征,在语音技术领域常被视为低资源语言。为开发鲁棒的阿拉伯语自动语音识别系统,需要有效策略来应对其复杂性。本研究探讨了影响ASR性能的三个关键因素:预训练中方言覆盖的作用、方言特定微调与多方言方法的比较效果,以及向未见方言的泛化能力。通过对不同方言组合的广泛实验,我们的研究结果为推进阿拉伯语等多中心语言的ASR系统发展提供了重要见解。