Large Language Model Displays Emergent Ability to Interpret Novel Literary Metaphors

Recent advances in the performance of large language models (LLMs) have sparked debate over whether, given sufficient training, high-level human abilities emerge in such generic forms of artificial intelligence (AI). Despite the exceptional performance of LLMs on a wide range of tasks involving natural language processing and reasoning, there has been sharp disagreement as to whether their abilities extend to more creative human abilities. A core example is the ability to interpret novel metaphors. Given the enormous and non curated text corpora used to train LLMs, a serious obstacle to designing tests is the requirement of finding novel yet high quality metaphors that are unlikely to have been included in the training data. Here we assessed the ability of GPT4, a state of the art large language model, to provide natural-language interpretations of novel literary metaphors drawn from Serbian poetry and translated into English. Despite exhibiting no signs of having been exposed to these metaphors previously, the AI system consistently produced detailed and incisive interpretations. Human judges, blind to the fact that an AI model was involved, rated metaphor interpretations generated by GPT4 as superior to those provided by a group of college students. In interpreting reversed metaphors, GPT4, as well as humans, exhibited signs of sensitivity to the Gricean cooperative principle. In addition, for several novel English poems GPT4 produced interpretations that were rated as excellent or good by a human literary critic. These results indicate that LLMs such as GPT4 have acquired an emergent ability to interpret complex metaphors, including those embedded in novel poems.

翻译：近年来，大型语言模型性能的显著提升引发了关于其是否能在通用人工智能中通过充分训练涌现出高级人类能力的争论。尽管大型语言模型在涉及自然语言处理和推理的广泛任务中表现出色，但其能力能否延伸至更具创造性的人类能力仍存在尖锐分歧，核心例证便是解读新颖隐喻的能力。由于训练大型语言模型的海量非策展文本语料库存在特殊性，设计测试的关键障碍在于寻找既新颖又高质量、且不太可能已包含于训练数据中的隐喻。本文评估了最先进的GPT4大型语言模型对源自塞尔维亚诗歌并英译后文学新颖隐喻的自然语言解读能力。尽管该AI系统从未接触过这些隐喻，其始终能提供细致入微且深刻的解读。在对AI模型身份不知情的情况下，人类评审员认为GPT4生成的隐喻解读优于一组大学生提供的解读。在解读反向隐喻时，GPT4与人类均表现出对格莱斯合作原则的敏感性。此外，对于多首原创英文诗歌，GPT4的解读被人类文学评论家评为"优秀"或"良好"。这些结果表明，GPT4等大型语言模型已获得解读复杂隐喻（包括嵌入原创诗歌中的隐喻）的新兴能力。