The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a non-visual matrix reasoning task based on the rule structure of Raven's Standard Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings; preliminary tests of GPT-4 indicated even better performance. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
翻译:最近大型语言模型的出现,重新引发了关于此类通用模型在充足训练数据下是否能涌现人类认知能力的辩论。尤其令人关注的是,这些模型在零样本条件下无需直接训练即可推理新颖问题的能力。在人类认知中,这种能力与类比推理能力密切相关。本文对一组类比任务进行了人类推理者与大型语言模型(GPT-3的text-davinci-003变体)的直接比较,这些任务包括基于瑞文标准渐进矩阵规则结构的非视觉矩阵推理任务。我们发现,GPT-3展现出惊人的抽象模式归纳能力,在大多数实验条件下达到甚至超越人类水平;GPT-4的初步测试表明其性能更为出色。我们的研究结果表明,GPT-3等大型语言模型已获得了一种涌现能力,能够零样本地解决广泛的类比推理问题。