Chinese spelling check is a task to detect and correct spelling mistakes in Chinese text. Existing research aims to enhance the text representation and use multi-source information to improve the detection and correction capabilities of models, but does not pay too much attention to improving their ability to distinguish between confusable words. Contrastive learning, whose aim is to minimize the distance in representation space between similar sample pairs, has recently become a dominant technique in natural language processing. Inspired by contrastive learning, we present a novel framework for Chinese spelling checking, which consists of three modules: language representation, spelling check and reverse contrastive learning. Specifically, we propose a reverse contrastive learning strategy, which explicitly forces the model to minimize the agreement between the similar examples, namely, the phonetically and visually confusable characters. Experimental results show that our framework is model-agnostic and could be combined with existing Chinese spelling check models to yield state-of-the-art performance.
翻译:中文拼写检查是一项检测并纠正中文文本中拼写错误的任务。现有研究旨在增强文本表示并利用多源信息提升模型的检测与纠错能力,但未充分关注提升模型对易混词的区分能力。对比学习通过最小化相似样本对在表示空间中的距离,近年来已成为自然语言处理领域的主流技术。受对比学习启发,我们提出了一种新颖的中文拼写检查框架,该框架包含三个模块:语言表示模块、拼写检查模块和反向对比学习模块。具体而言,我们提出了一种反向对比学习策略,通过显式强制模型最小化相似样本(即音近字和形近字)之间的相似度。实验结果表明,该框架具有模型无关性,可与现有中文拼写检查模型结合,达到当前最优性能。