The Quran is the holy scripture of Islam, and its recitation is an important aspect of the religion. Recognizing the recitation of the Holy Quran automatically is a challenging task due to its unique rules that are not applied in normal speaking speeches. A lot of research has been done in this domain, but previous works have detected recitation errors as a classification task or used traditional automatic speech recognition (ASR). In this paper, we proposed a novel end-to-end deep learning model for recognizing the recitation of the Holy Quran. The proposed model is a CNN-Bidirectional GRU encoder that uses CTC as an objective function, and a character-based decoder which is a beam search decoder. Moreover, all previous works were done on small private datasets consisting of short verses and a few chapters of the Holy Quran. As a result of using private datasets, no comparisons were done. To overcome this issue, we used a public dataset that has recently been published (Ar-DAD) and contains about 37 chapters that were recited by 30 reciters, with different recitation speeds and different types of pronunciation rules. The proposed model performance was evaluated using the most common evaluation metrics in speech recognition, word error rate (WER), and character error rate (CER). The results were 8.34% WER and 2.42% CER. We hope this research will be a baseline for comparisons with future research on this public new dataset (Ar-DAD).
翻译:《古兰经》是伊斯兰教的圣典,其诵读是宗教实践的重要环节。由于《古兰经》具有普通口语中不应用的独特规则,自动识别其诵读是一项具有挑战性的任务。该领域已有大量研究,但先前的工作或将其作为分类任务检测诵读错误,或采用传统自动语音识别(ASR)技术。本文提出了一种新颖的端到端深度学习模型用于《古兰经》诵读识别。所提模型采用CNN-双向GRU编码器,以CTC作为目标函数,并配备基于字符的波束搜索解码器。此外,以往所有研究均基于小型私有数据集,仅包含《古兰经》中简短的经文章节,导致无法进行横向比较。针对此问题,我们使用了近期公开的数据集Ar-DAD,该数据集包含30位诵读者的约37个章节,涵盖不同诵读速度及多样化的发音规则类型。采用语音识别领域最通用的评估指标——词错误率(WER)和字符错误率(CER)对模型性能进行评估,结果分别为8.34%的WER和2.42%的CER。我们期望本研究能为后续基于该公开新数据集(Ar-DAD)的研究提供基准对照。