We recently reported evidence that large language models are capable of solving a wide range of text-based analogy problems in a zero-shot manner, indicating the presence of an emergent capacity for analogical reasoning. Two recent commentaries have challenged these results, citing evidence from so-called `counterfactual' tasks in which the standard sequence of the alphabet is arbitrarily permuted so as to decrease similarity with materials that may have been present in the language model's training data. Here, we reply to these critiques, clarifying some misunderstandings about the test materials used in our original work, and presenting evidence that language models are also capable of generalizing to these new counterfactual task variants.
翻译:我们近期报告了证据,表明大型语言模型能以零样本方式解决广泛的文本类比问题,这揭示了其具备涌现的类比推理能力。最近的两篇评论文章质疑了这些结果,其引用了所谓“反事实”任务的证据——在该任务中,字母表的标准序列被任意打乱,以降低与语言模型训练数据中可能存在的材料的相似性。在此,我们回应这些批评,澄清我们原始工作中所用测试材料的一些误解,并提出证据表明语言模型也能泛化至这些新型反事实任务变体。