Recent advances in the performance of large language models (LLMs) have sparked debate over whether, given sufficient training, high-level human abilities emerge in such generic forms of artificial intelligence (AI). Despite the exceptional performance of LLMs on a wide range of tasks involving natural language processing and reasoning, there has been sharp disagreement as to whether their abilities extend to more creative human abilities. A core example is the ability to interpret novel metaphors. Given the enormous and non-curated text corpora used to train LLMs, a serious obstacle to designing tests is the requirement of finding novel yet high-quality metaphors that are unlikely to have been included in the training data. Here we assessed the ability of GPT-4, a state-of-the-art large language model, to provide natural-language interpretations of novel literary metaphors drawn from Serbian poetry and translated into English. Despite exhibiting no signs of having been exposed to these metaphors previously, the AI system consistently produced detailed and incisive interpretations. Human judge - blind to the fact that an AI model was involved - rated metaphor interpretations generated by GPT-4 as superior to those provided by a group of college students. In interpreting reversed metaphors, GPT-4, as well as humans, exhibited signs of sensitivity to the Gricean cooperative principle. These results indicate that LLMs such as GPT-4 have acquired an emergent ability to interpret complex novel metaphors.
翻译:近年来,大语言模型(LLMs)性能的显著进步引发了关于此类通用形式人工智能(AI)在充分训练后是否会涌现出人类高级能力的争论。尽管LLMs在涉及自然语言处理与推理的广泛任务中表现卓越,但其能力是否延伸至更具创造力的人类能力方面仍存在尖锐分歧,核心争议点在于解读新颖隐喻的能力。由于训练LLMs所使用的海量非精选文本语料库,设计测试面临的关键障碍在于必须寻找既新颖又高质量的隐喻,且这些隐喻不太可能已存在于训练数据中。本研究评估了最先进大语言模型GPT-4对塞尔维亚诗歌中提取并翻译成英文的新颖文学隐喻进行自然语言解读的能力。尽管该AI系统此前未暴露于这些隐喻的任何迹象,它始终能提供详尽而深刻的解读。在不知情AI模型参与的情况下,人类评审员判定GPT-4生成的隐喻解读优于一组大学生提供的解读。在解读反转隐喻时,GPT-4与人类均表现出对格赖斯合作原则的敏感性。这些结果表明,GPT-4等大语言模型已获得解读复杂新颖隐喻的涌现能力。