Arabic is a complex language with many varieties and dialects spoken by over 450 millions all around the world. Due to the linguistic diversity and variations, it is challenging to build a robust and generalized ASR system for Arabic. In this work, we address this gap by developing and demoing a system, dubbed VoxArabica, for dialect identification (DID) as well as automatic speech recognition (ASR) of Arabic. We train a wide range of models such as HuBERT (DID), Whisper, and XLS-R (ASR) in a supervised setting for Arabic DID and ASR tasks. Our DID models are trained to identify 17 different dialects in addition to MSA. We finetune our ASR models on MSA, Egyptian, Moroccan, and mixed data. Additionally, for the remaining dialects in ASR, we provide the option to choose various models such as Whisper and MMS in a zero-shot setting. We integrate these models into a single web interface with diverse features such as audio recording, file upload, model selection, and the option to raise flags for incorrect outputs. Overall, we believe VoxArabica will be useful for a wide range of audiences concerned with Arabic research. Our system is currently running at https://cdce-206-12-100-168.ngrok.io/.
翻译:阿拉伯语是一种复杂的语言,全球超过4.5亿人使用的多种变体和方言。由于语言多样性和变体,构建一个鲁棒且泛化的阿拉伯语自动语音识别(ASR)系统极具挑战性。为解决这一空白,我们开发并演示了一个名为VoxArabica的系统,用于阿拉伯语的方言识别(DID)及自动语音识别(ASR)。我们以监督方式训练了包括HuBERT(用于DID)、Whisper和XLS-R(用于ASR)在内的多种模型,以完成阿拉伯语DID和ASR任务。我们的DID模型可识别包括现代标准阿拉伯语(MSA)在内的17种不同方言。我们基于MSA、埃及方言、摩洛哥方言及混合数据对ASR模型进行微调。针对ASR中的其余方言,我们提供多种模型选择选项(如Whisper和MMS)以零样本方式使用。我们将这些模型集成到一个统一的Web界面中,具备音频录制、文件上传、模型选择以及报告错误输出的功能。总体而言,我们相信VoxArabica将对关注阿拉伯语研究的广大用户群体具有实用价值。该系统目前运行于https://cdce-206-12-100-168.ngrok.io/。