Mispronunciation Detection of Basic Quranic Recitation Rules using Deep Learning

In Islam, readers must apply a set of pronunciation rules called Tajweed rules to recite the Quran in the same way that the angel Jibrael taught the Prophet, Muhammad. The traditional process of learning the correct application of these rules requires a human who must have a license and great experience to detect mispronunciation. Due to the increasing number of Muslims around the world, the number of Tajweed teachers is not enough nowadays for daily recitation practice for every Muslim. Therefore, lots of work has been done for automatic Tajweed rules' mispronunciation detection to help readers recite Quran correctly in an easier way and shorter time than traditional learning ways. All previous works have three common problems. First, most of them focused on machine learning algorithms only. Second, they used private datasets with no benchmark to compare with. Third, they did not take into consideration the sequence of input data optimally, although the speech signal is time series. To overcome these problems, we proposed a solution that consists of Mel-Frequency Cepstral Coefficient (MFCC) features with Long Short-Term Memory (LSTM) neural networks which use the time series, to detect mispronunciation in Tajweed rules. In addition, our experiments were performed on a public dataset, the QDAT dataset, which contains more than 1500 voices of the correct and incorrect recitation of three Tajweed rules (Separate stretching , Tight Noon , and Hide ). To the best of our knowledge, the QDAT dataset has not been used by any research paper yet. We compared the performance of the proposed LSTM model with traditional machine learning algorithms used in SoTA. The LSTM model with time series showed clear superiority over traditional machine learning. The accuracy achieved by LSTM on the QDAT dataset was 96%, 95%, and 96% for the three rules (Separate stretching, Tight Noon, and Hide), respectively.

翻译：在伊斯兰教中，诵读者必须遵循一套名为"泰吉威德"（Tajweed）规则的发音规范，以天使吉卜利勒（Jibrael）教导先知穆罕默德的方式诵读《古兰经》。传统上，学习正确运用这些规则需要由持有资质且经验丰富的人类教师来检测错读。随着全球穆斯林人口的持续增长，目前泰吉威德教师数量已无法满足每位穆斯林每日诵读练习的需求。为此，大量研究致力于实现泰吉威德规则错读的自动检测，帮助诵读者以更便捷、更高效的方式正确诵读《古兰经》。现有研究普遍存在三个问题：其一，多数研究仅采用机器学习算法；其二，使用私有数据集导致缺乏基准对比；其三，尽管语音信号具有时序特性，但现有方法未充分优化输入数据的序列化处理。为解决上述问题，我们提出一种基于梅尔频率倒谱系数（MFCC）特征与长短期记忆（LSTM）神经网络的解决方案，利用LSTM的时序建模能力检测泰吉威德规则错读。实验采用公开数据集QDAT（包含1500余条三种泰吉威德规则——"分延律"、"鼻化律"与"隐藏律"的正确与错误诵读语音数据），据我们所知，目前尚无研究使用过该数据集。我们将所提LSTM模型的性能与当前最优（SoTA）传统机器学习算法进行对比，结果表明：引入时序建模的LSTM模型显著优于传统机器学习方法。在QDAT数据集上，LSTM对"分延律"、"鼻化律"与"隐藏律"三种规则的检测准确率分别达到96%、95%和96%。