Lexical simplification (LS) methods based on pretrained language models have made remarkable progress, generating potential substitutes for a complex word through analysis of its contextual surroundings. However, these methods require separate pretrained models for different languages and disregard the preservation of sentence meaning. In this paper, we propose a novel multilingual LS method via paraphrase generation, as paraphrases provide diversity in word selection while preserving the sentence's meaning. We regard paraphrasing as a zero-shot translation task within multilingual neural machine translation that supports hundreds of languages. After feeding the input sentence into the encoder of paraphrase modeling, we generate the substitutes based on a novel decoding strategy that concentrates solely on the lexical variations of the complex word. Experimental results demonstrate that our approach surpasses BERT-based methods and zero-shot GPT3-based method significantly on English, Spanish, and Portuguese.
翻译:基于预训练语言模型的词汇简化(LS)方法取得了显著进展,通过分析复杂词的上下文环境生成其潜在替换词。然而,这些方法需要为不同语言分别训练预训练模型,且忽视了句子语义的保持。本文提出了一种基于释义生成的新型多语言词汇简化方法,因为释义在保持句子语义的同时提供了词汇选择的多样性。我们将释义视为多语言神经机器翻译中的零样本翻译任务,该系统支持数百种语言。在将输入句子送入释义建模的编码器后,我们基于一种专注于复杂词词汇变化的新型解码策略生成替换词。实验结果表明,我们的方法在英语、西班牙语和葡萄牙语上显著优于基于BERT的方法和基于零样本GPT3的方法。