This paper proposes a Chinese spelling correction method based on plugin extension modules, aimed at addressing the limitations of existing models in handling domain-specific texts. Traditional Chinese spelling correction models are typically trained on general-domain datasets, resulting in poor performance when encountering specialized terminology in domain-specific texts. To address this issue, we design an extension module that learns the features of domain-specific terminology, thereby enhancing the model's correction capabilities within specific domains. This extension module can provide domain knowledge to the model without compromising its general spelling correction performance, thus improving its accuracy in specialized fields. Experimental results demonstrate that after integrating extension modules for medical, legal, and official document domains, the model's correction performance is significantly improved compared to the baseline model without any extension modules.
翻译:本文提出了一种基于插件扩展模块的中文拼写纠错方法,旨在解决现有模型在处理领域特定文本时的局限性。传统的中文拼写纠错模型通常在通用领域数据集上进行训练,导致其在遇到领域特定文本中的专业术语时表现不佳。为解决此问题,我们设计了一个扩展模块,用于学习领域特定术语的特征,从而增强模型在特定领域内的纠错能力。该扩展模块能够在不损害模型通用拼写纠错性能的前提下,为模型提供领域知识,进而提升其在专业领域的准确性。实验结果表明,在集成针对医疗、法律和公文领域的扩展模块后,模型的纠错性能相较于未集成任何扩展模块的基线模型有显著提升。