The ability to process idiomatic or literal multiword expressions is a crucial aspect of understanding and generating any language. The task of generating contextually relevant continuations for narratives containing idiomatic (or literal) expressions can allow us to test the ability of generative language models (LMs) in understanding nuanced language containing non-compositional figurative text. We conduct a series of experiments using datasets in two distinct languages (English and Portuguese) under three different training settings (zero-shot, few-shot, and fine-tuned). Our results suggest that the models are only slightly better at generating continuations for literal contexts than idiomatic contexts, with exceedingly small margins. Furthermore, the models studied in this work perform equally well across both languages, indicating the robustness of generative models in performing this task.
翻译:处理习语性或字面性多词表达的能力,是理解与生成任何语言的关键环节。针对包含习语(或字面)表达叙事的语境相关续写生成任务,可检验生成式语言模型(LM)理解包含非组合性比喻文本的细微语言能力。我们使用两种不同语言(英语和葡萄牙语)的数据集,在三种训练设置(零样本、少样本和微调)下开展系列实验。结果表明,模型在字面性语境下的续写生成能力仅略优于习语性语境,且差距极小。此外,本研究所涉及的模型在两种语言上表现相当,表明生成式模型在该任务中具备稳健性。