We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models. Given a target word and its context, our method generates substitutes based on the target context and also additional contexts sampled from monolingual data. We conduct experiments in English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that our model substantially outperforms other unsupervised systems across all languages. We also establish a new state-of-the-art by ensembling our model with GPT-3.5. Lastly, we evaluate our model on the SWORDS lexical substitution data set, achieving a state-of-the-art result.
翻译:我们提出了一种新的无监督词汇简化方法,该方法仅使用单语数据和预训练语言模型。给定目标词及其上下文,我们的方法基于目标上下文以及从单语数据中采样的额外上下文生成替代词。我们在TSAR-2022共享任务中的英语、葡萄牙语和西班牙语上进行了实验,结果表明我们的模型在所有语言上均显著优于其他无监督系统。此外,通过将我们的模型与GPT-3.5进行集成,我们取得了新的最优性能。最后,我们在SWORDS词汇替代数据集上评估了模型,同样取得了最优结果。