Metaphors are considered to pose challenges for a wide spectrum of NLP tasks. This gives rise to the area of computational metaphor processing. However, it remains unclear what types of metaphors challenge current state-of-the-art models. In this paper, we test various NLP models on the VUA metaphor dataset and quantify to what extent metaphors affect models' performance on various downstream tasks. Analysis reveals that VUA includes a large number of metaphors that pose little difficulty to downstream tasks. We would like to shift the attention of researchers away from these metaphors to instead focus on challenging metaphors. To identify hard metaphors, we propose an automatic pipeline that identifies metaphors that challenge a particular model. Our analysis demonstrates that our detected hard metaphors contrast significantly with VUA and reduce the accuracy of machine translation by 16\%, QA performance by 4\%, NLI by 7\%, and metaphor identification recall by over 14\% for various popular NLP systems.
翻译:隐喻被认为对广泛的自然语言处理任务构成挑战,由此催生了计算隐喻处理领域。然而,目前尚不清楚何种类型的隐喻对当前最先进的模型构成挑战。本文在VUA隐喻数据集上测试了多种NLP模型,并量化了隐喻对模型在不同下游任务中性能的影响程度。分析表明,VUA中包含大量对下游任务几乎不构成困难的隐喻。我们希望将研究者的注意力从这些隐喻转向挑战性隐喻。为了识别困难隐喻,我们提出了一种自动流水线,用于识别对特定模型具有挑战性的隐喻。我们的分析表明,检测到的困难隐喻与VUA存在显著差异,并将机器翻译的准确率降低16%、问答性能降低4%、自然语言推理降低7%,以及多种流行NLP系统的隐喻识别召回率降低超过14%。