Chinese Text Error Correction (CTEC) aims to detect and correct errors in the input text, which benefits human daily life and various downstream tasks. Recent approaches mainly employ Pre-trained Language Models (PLMs) to resolve CTEC. Although PLMs have achieved remarkable success in CTEC, we argue that previous studies still overlook the importance of human thinking patterns. To enhance the development of PLMs for CTEC, inspired by humans' daily error-correcting behavior, we propose a novel model-agnostic progressive learning framework, named ProTEC, which guides PLMs-based CTEC models to learn to correct like humans. During the training process, ProTEC guides the model to learn text error correction by incorporating these sub-tasks into a progressive paradigm. During the inference process, the model completes these sub-tasks in turn to generate the correction results. Extensive experiments and detailed analyses demonstrate the effectiveness and efficiency of our proposed model-agnostic ProTEC framework.
翻译:中文文本纠错(CTEC)旨在检测并纠正输入文本中的错误,这对人类日常生活及各类下游任务均有益处。近期方法主要采用预训练语言模型(PLMs)来解决CTEC问题。尽管PLMs在CTEC领域已取得显著成功,但我们认为现有研究仍忽视了人类思维模式的重要性。为促进PLMs在CTEC中的发展,受人类日常纠错行为启发,我们提出了一种新颖的与模型无关的渐进学习框架ProTEC,该框架引导基于PLMs的CTEC模型学习像人类一样进行纠错。在训练过程中,ProTEC通过将子任务融入渐进范式来引导模型学习文本纠错;在推理过程中,模型依次完成这些子任务以生成纠错结果。大量实验与详细分析证明了我们提出的与模型无关的ProTEC框架的有效性与高效性。