Lexical Simplification (LS) aims to simplify text at the lexical level. Existing methods rely heavily on annotated data, making it challenging to apply in low-resource scenarios. In this paper, we propose a novel LS method without parallel corpora. This method employs an Adversarial Editing System with guidance from a confusion loss and an invariance loss to predict lexical edits in the original sentences. Meanwhile, we introduce an innovative LLM-enhanced loss to enable the distillation of knowledge from Large Language Models (LLMs) into a small-size LS system. From that, complex words within sentences are masked and a Difficulty-aware Filling module is crafted to replace masked positions with simpler words. At last, extensive experimental results and analyses on three benchmark LS datasets demonstrate the effectiveness of our proposed method.
翻译:词汇简化旨在从词汇层面简化文本。现有方法严重依赖标注数据,这使得它们在资源匮乏场景中的应用面临挑战。本文提出了一种无需平行语料库的新型词汇简化方法。该方法采用对抗编辑系统,通过混淆损失与不变性损失的引导,预测原始句子中的词汇编辑。同时,我们引入了一种创新的大语言模型增强损失,将大语言模型的知识蒸馏至小型词汇简化系统中。由此,句子中的复杂词汇被掩码处理,并通过一个难度感知填充模块,用更简单的词汇替换掩码位置。最后,在三个基准词汇简化数据集上的广泛实验结果与分析,证明了所提方法的有效性。