Arabic is a complex language with many varieties and dialects spoken by over 450 millions all around the world. Due to the linguistic diversity and variations, it is challenging to build a robust and generalized ASR system for Arabic. In this work, we address this gap by developing and demoing a system, dubbed VoxArabica, for dialect identification (DID) as well as automatic speech recognition (ASR) of Arabic. We train a wide range of models such as HuBERT (DID), Whisper, and XLS-R (ASR) in a supervised setting for Arabic DID and ASR tasks. Our DID models are trained to identify 17 different dialects in addition to MSA. We finetune our ASR models on MSA, Egyptian, Moroccan, and mixed data. Additionally, for the remaining dialects in ASR, we provide the option to choose various models such as Whisper and MMS in a zero-shot setting. We integrate these models into a single web interface with diverse features such as audio recording, file upload, model selection, and the option to raise flags for incorrect outputs. Overall, we believe VoxArabica will be useful for a wide range of audiences concerned with Arabic research. Our system is currently running at https://cdce-206-12-100-168.ngrok.io/.
翻译:阿拉伯语是一种复杂的语言,全球超过4.5亿人使用其众多变体和方言。由于语言多样性和变体,构建稳健且泛化的阿拉伯语自动语音识别(ASR)系统颇具挑战。本研究针对这一空白,开发并演示了一个名为VoxArabica的系统,用于阿拉伯语的方言识别(DID)和自动语音识别(ASR)。我们以监督方式训练了多种模型,如用于DID的HuBERT,以及用于ASR的Whisper和XLS-R,以处理阿拉伯语DID和ASR任务。我们的DID模型训练用于识别除现代标准阿拉伯语(MSA)外的17种方言。ASR模型针对MSA、埃及方言、摩洛哥方言及混合数据进行了微调。此外,对于ASR中其余方言,我们提供了零样本设置下选择Whisper和MMS等多种模型的选项。我们将这些模型集成到单一网页界面中,具备音频录制、文件上传、模型选择以及标记错误输出等功能。总体而言,我们相信VoxArabica对关注阿拉伯语研究的广泛受众具有实用价值。该系统当前运行于https://cdce-206-12-100-168.ngrok.io/。