With the development of pre-trained models and the incorporation of phonetic and graphic information, neural models have achieved high scores in Chinese Spelling Check (CSC). However, it does not provide a comprehensive reflection of the models' capability due to the limited test sets. In this study, we abstract the representative model paradigm, implement it with nine structures and experiment them on comprehensive test sets we constructed with different purposes. We perform a detailed analysis of the results and find that: 1) Fusing phonetic and graphic information reasonably is effective for CSC. 2) Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models and reveals the direction we should work on. 3) Whether or not the errors and contexts have been seen has a significant impact on models. 4) The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
翻译:随着预训练模型的发展以及语音和图形信息的融入,神经模型在中文拼写检查(CSC)任务中取得了高分。然而,由于测试集的局限性,这并不能全面反映模型的能力。在本研究中,我们抽象出代表性的模型范式,使用九种结构实现该范式,并在我们构建的具有不同目的的综合测试集上进行实验。我们对结果进行详细分析发现:1)合理融合语音和图形信息对CSC是有效的。2)模型对测试集的错误分布敏感,这反映了模型的不足并揭示了应努力的方向。3)模型是否见过错误及其上下文对其性能有显著影响。4)常用的基准测试集SIGHAN无法可靠地评估模型性能。