Lexical Simplification (LS) aims to simplify text at the lexical level. Existing methods rely heavily on annotated data, making it challenging to apply in low-resource scenarios. In this paper, we propose a novel LS method without parallel corpora. This method employs an Adversarial Editing System with guidance from a confusion loss and an invariance loss to predict lexical edits in the original sentences. Meanwhile, we introduce an innovative LLM-enhanced loss to enable the distillation of knowledge from Large Language Models (LLMs) into a small-size LS system. From that, complex words within sentences are masked and a Difficulty-aware Filling module is crafted to replace masked positions with simpler words. At last, extensive experimental results and analyses on three benchmark LS datasets demonstrate the effectiveness of our proposed method.
翻译:词汇简化(Lexical Simplification,LS)旨在从词汇层面简化文本。现有方法严重依赖标注数据,难以应用于低资源场景。本文提出一种无需平行语料的新型LS方法。该方法采用对抗编辑系统,在混淆损失与不变性损失的引导下预测原始句子中的词汇编辑操作。同时,我们引入创新性的大语言模型增强损失,将大语言模型(LLM)的知识蒸馏至小型LS系统。由此,句子中的复杂词被掩蔽处理,并设计出难度感知填充模块,以更简单的词汇替换掩蔽位置。最后,在三个基准LS数据集上的广泛实验与结果分析证明了所提方法的有效性。