Metaphors, although occasionally unperceived, are ubiquitous in our everyday language. Thus, it is crucial for Language Models to be able to grasp the underlying meaning of this kind of figurative language. In this work, we present Meta4XNLI, a novel parallel dataset for the tasks of metaphor detection and interpretation that contains metaphor annotations in both Spanish and English. We investigate language models' metaphor identification and understanding abilities through a series of monolingual and cross-lingual experiments by leveraging our proposed corpus. In order to comprehend how these non-literal expressions affect models' performance, we look over the results and perform an error analysis. Additionally, parallel data offers many potential opportunities to investigate metaphor transferability between these languages and the impact of translation on the development of multilingual annotated resources.
翻译:隐喻虽常不被察觉,却广泛存在于日常语言中。因此,语言模型能够理解这类比喻性语言的深层含义至关重要。本文提出Meta4XNLI——一个面向隐喻检测与释义任务的新型平行数据集,包含西班牙语和英语两种语言的隐喻标注。我们利用该语料库,通过一系列单语和跨语言实验,探究语言模型的隐喻识别与理解能力。为深入理解这些非字面表达对模型性能的影响,我们审视实验结果并开展错误分析。此外,平行数据为考察隐喻在这两种语言间的可迁移性、以及翻译对多语言标注资源构建的影响,提供了诸多潜在研究契机。