Novel metaphor comprehension involves complex semantic processes and linguistic creativity, making it an interesting task for studying language models (LMs). This study investigates whether surprisal, a probabilistic measure of predictability in LMs, correlates with different metaphor novelty datasets. We analyse surprisal from 16 LM variants on corpus-based and synthetic metaphor novelty datasets. We explore a cloze-style surprisal method that conditions on full-sentence context. Results show that LMs yield significant moderate correlations with scores/labels of metaphor novelty. We further identify divergent scaling patterns: on corpus-based data, correlation strength decreases with model size (inverse scaling effect), whereas on synthetic data it increases (Quality-Power Hypothesis). We conclude that while surprisal can partially account for annotations of metaphor novelty, it remains a limited metric of linguistic creativity.
翻译:新颖隐喻理解涉及复杂的语义加工与语言创造力,使其成为研究语言模型(LMs)的有趣任务。本研究探讨了惊奇度——一种衡量语言模型预测能力的概率指标——是否与不同的隐喻新颖性数据集存在相关性。我们基于16种语言模型变体,在基于语料库的隐喻新颖性数据集和合成数据集上分析了惊奇度,并探索了一种以完整句子上下文为条件的完形填空式惊奇度计算方法。结果表明,语言模型产生的惊奇度与隐喻新颖性的评分/标签之间存在显著的中等程度相关性。我们进一步发现了相异的尺度变化规律:在基于语料库的数据上,相关性强度随模型规模增大而减弱(逆向尺度效应);而在合成数据上,相关性强度随模型规模增大而增强(质量-能力假说)。我们的结论是:虽然惊奇度能够部分解释隐喻新颖性的人工标注结果,但其作为衡量语言创造力的指标仍存在局限性。