Chinese Spelling Check (CSC) refers to the detection and correction of spelling errors in Chinese texts. In practical application scenarios, it is important to make CSC models have the ability to correct errors across different domains. In this paper, we propose a retrieval-augmented spelling check framework called RSpell, which searches corresponding domain terms and incorporates them into CSC models. Specifically, we employ pinyin fuzzy matching to search for terms, which are combined with the input and fed into the CSC model. Then, we introduce an adaptive process control mechanism to dynamically adjust the impact of external knowledge on the model. Additionally, we develop an iterative strategy for the RSpell framework to enhance reasoning capabilities. We conducted experiments on CSC datasets in three domains: law, medicine, and official document writing. The results demonstrate that RSpell achieves state-of-the-art performance in both zero-shot and fine-tuning scenarios, demonstrating the effectiveness of the retrieval-augmented CSC framework. Our code is available at https://github.com/47777777/Rspell.
翻译:中文拼写检查(CSC)旨在检测并纠正中文文本中的拼写错误。在实际应用场景中,使CSC模型具备跨领域纠错能力至关重要。本文提出一种名为RSpell的检索增强拼写检查框架,该框架通过搜索相应领域术语并将其融入CSC模型实现目标。具体而言,我们采用拼音模糊匹配检索术语,将其与输入结合后输入CSC模型;随后引入自适应过程控制机制,动态调节外部知识对模型的影响。此外,我们为RSpell框架开发了迭代策略以增强推理能力。在法学、医学和公文写作三个领域的CSC数据集上进行的实验表明,RSpell在零样本和微调场景中均达到最优性能,验证了检索增强CSC框架的有效性。我们的代码已开源至 https://github.com/47777777/Rspell。