Novel metaphor comprehension involves complex semantic processes and linguistic creativity, making it an interesting task for studying language models (LMs). This study investigates whether surprisal, a probabilistic measure of predictability in LMs, correlates with annotations of metaphor novelty in different datasets. We analyse the surprisal of metaphoric words in corpus-based and synthetic metaphor datasets using 16 causal LM variants. We propose a cloze-style surprisal method that conditions on full-sentence context. Results show that LM surprisal yields significant moderate correlations with scores/labels of metaphor novelty. We further identify divergent scaling patterns: on corpus-based data, correlation strength decreases with model size (inverse scaling effect), whereas on synthetic data it increases (quality-power hypothesis). We conclude that while surprisal can partially account for annotations of metaphor novelty, it remains limited as a metric of linguistic creativity. Code and data are publicly available: https://github.com/OmarMomen14/surprisal-metaphor-novelty
翻译:新颖隐喻理解涉及复杂的语义加工与语言创造性,使其成为研究语言模型(LMs)的有趣任务。本研究探讨了新奇性——一种衡量语言模型预测能力的概率指标——是否与不同数据集中隐喻新颖性的人工标注存在相关性。我们使用16种因果语言模型变体,分析了基于语料库的隐喻数据集与合成隐喻数据集中隐喻词的新奇性。我们提出了一种基于完整句子上下文的完形填空式新奇性计算方法。结果表明,语言模型新奇性与隐喻新颖性评分/标签存在显著的中等程度相关性。我们进一步发现了相异的尺度效应模式:在基于语料库的数据上,相关性强度随模型规模增大而减弱(逆向尺度效应);而在合成数据上,相关性随模型规模增大而增强(质量-能力假说)。我们的结论是:虽然新奇性能够部分解释隐喻新颖性的人工标注,但其作为语言创造性的度量指标仍存在局限。代码与数据已公开:https://github.com/OmarMomen14/surprisal-metaphor-novelty