This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an existing mobile application called NamazApp to collect audio recitations. We developed a crowdsourcing platform called Quran Voice for annotating the gathered audio assets. As a result, we have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries, and we have annotated 1166 recitations from the dataset in six categories. We have achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89 between the labels assigned by the algorithm and the expert judgments.
翻译:本文探讨了非阿拉伯语使用者学习《古兰经》诵经的挑战。我们研究了通过众包方式构建精细标注古兰经数据集的可能性,并在此基础上开发人工智能模型以简化学习过程。具体而言,我们采用基于志愿者的众包模式,开发了众包应用程序接口(API)收集音频素材,并将其集成至现有移动应用NamazApp中用于采集诵经音频。我们构建了名为Quran Voice的众包平台用于标注所收集的音频素材。最终,我们从11个以上非阿拉伯语国家的1287名参与者中采集了约7000份古兰经诵经音频,并对其中1166份音频进行了六类标注。我们实现了0.77的众包准确率,标注者间一致性达0.63,算法标注与专家判断间一致性达0.89。