Spelling correction is the task of identifying spelling mistakes, typos, and grammatical mistakes in a given text and correcting them according to their context and grammatical structure. This work introduces "AraSpell," a framework for Arabic spelling correction using different seq2seq model architectures such as Recurrent Neural Network (RNN) and Transformer with artificial data generation for error injection, trained on more than 6.9 Million Arabic sentences. Thorough experimental studies provide empirical evidence of the effectiveness of the proposed approach, which achieved 4.8% and 1.11% word error rate (WER) and character error rate (CER), respectively, in comparison with labeled data of 29.72% WER and 5.03% CER. Our approach achieved 2.9% CER and 10.65% WER in comparison with labeled data of 10.02% CER and 50.94% WER. Both of these results are obtained on a test set of 100K sentences.
翻译:拼写校正是指在给定文本中识别拼写错误、打字错误和语法错误,并根据其上下文和语法结构进行纠正的任务。本研究提出了“AraSpell”框架,该框架利用不同的序列到序列(seq2seq)模型架构(如循环神经网络RNN和Transformer)进行阿拉伯语拼写校正,并采用人工数据生成进行错误注入,训练数据超过690万阿拉伯语句子。深入的实验研究为所提方法的有效性提供了经验证据,相比标注数据29.72%的词错误率(WER)和5.03%的字符错误率(CER),该方法分别取得了4.8%和1.11%的WER和CER。此外,相比标注数据10.02%的CER和50.94%的WER,该方法取得了2.9%的CER和10.65%的WER。这两项结果均在包含10万句子的测试集上获得。