Large Language Models Are Not Abstract Reasoners - 专知论文

会员服务 ·

0

Performer · 语言模型化 · MoDELS · Cognition · Extensibility ·

2023 年 5 月 31 日

Large Language Models Are Not Abstract Reasoners

翻译：大型语言模型并非抽象推理者

Gaël Gendron,Qiming Bao,Michael Witbrock,Gillian Dobbie

from arxiv, 33 pages, 13 pages for the main paper and 20 pages for the supplement, 26 figures

Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain unknown, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally limited. Abstract reasoning is a fundamental task for cognition, consisting of finding and applying a general pattern from few data. Evaluating deep neural architectures on this task could give insight into their potential limitations regarding reasoning and their broad generalisation abilities, yet this is currently an under-explored area. In this paper, we perform extensive evaluations of state-of-the-art LLMs on abstract reasoning tasks, showing that they achieve very limited performance in contrast with other natural language tasks, and we investigate the reasons for this difference. We apply techniques that have been shown to improve performance on other NLP tasks and show that in most cases their impact on abstract reasoning performance is limited. In the course of this work, we have generated a new benchmark for evaluating language models on abstract reasoning tasks.

翻译：大型语言模型在从文本理解到常识推理的广泛自然语言处理任务中展现了卓越的性能。然而，支撑这一成功的机制仍不明确，尚不清楚这些模型能否实现人类般的认知能力，抑或存在根本性局限。抽象推理作为认知的核心任务，要求从少量数据中发现并应用通用模式。评估深度神经网络在此类任务上的表现，有助于揭示其在推理能力及广泛泛化方面的潜在局限，但目前这一领域尚待深入探索。本文对当前最先进的大型语言模型在抽象推理任务上进行了全面评估，结果显示，与自然语言任务相比，这些模型在此领域的表现极为有限。我们进一步探究了造成这一差异的原因，并应用了已被证明可提升其他NLP任务性能的技术，但发现这些技术在绝大多数情况下对抽象推理性能的提升效果甚微。在研究过程中，我们构建了一个新的基准测试，用于评估语言模型在抽象推理任务上的表现。

0

相关内容

Performer

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

介孔材料受限空间中的AGET ATRP和ARGET ATRP聚合反应

国家自然科学基金

0+阅读 · 2016年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

大气细颗粒物PM2.5对大鼠神经损伤机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

BDNF/TrkB途径介导调控骨髓瘤MDSCs破骨分化的作用和机制

国家自然科学基金

0+阅读 · 2013年12月31日

流程泵径向叶轮内空化流动机理及动力学特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

一株含双降解质粒的红球菌（Rhodococcus sp.）二噁英降解机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

白桦FT及SOC1基因的RNAi研究

国家自然科学基金

0+阅读 · 2009年12月31日

RIP140在神经元和神经胶质细胞增殖中的作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

Sabiá: Portuguese Large Language Models

Sabiá: Portuguese Large Language Models

Arxiv

0+阅读 · 2023年7月20日

Large Language Models can accomplish Business Process Management Tasks

Arxiv

0+阅读 · 2023年7月19日

Can In-context Learners Learn a Reasoning Concept from Demonstrations?

Arxiv

0+阅读 · 2023年7月19日

Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community

Arxiv

0+阅读 · 2023年7月19日

A Comprehensive Overview of Large Language Models

Arxiv

21+阅读 · 2023年7月12日

A Survey on Large Language Models for Recommendation

Arxiv

12+阅读 · 2023年5月31日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Variational Knowledge Graph Reasoning

Arxiv

15+阅读 · 2018年4月5日

VIP会员

文章信息

相关主题

语言模型化

最新内容

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

1+阅读 · 26分钟前

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

1+阅读 · 28分钟前

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

4+阅读 · 今天6:30

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

4+阅读 · 今天6:18

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

4+阅读 · 今天6:08

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

4+阅读 · 今天5:54

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

4+阅读 · 今天5:22

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

5+阅读 · 今天5:15

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

5+阅读 · 今天3:42

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

4+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

4+阅读 · 6月24日

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

9+阅读 · 6月24日

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

8+阅读 · 6月24日

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

6+阅读 · 6月24日

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

8+阅读 · 6月24日

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

网状网络及其在军事领域的运用

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Sabiá: Portuguese Large Language Models

Sabiá: Portuguese Large Language Models

Arxiv

0+阅读 · 2023年7月20日

Large Language Models can accomplish Business Process Management Tasks

Arxiv

0+阅读 · 2023年7月19日

Can In-context Learners Learn a Reasoning Concept from Demonstrations?

Arxiv

0+阅读 · 2023年7月19日

Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community

Arxiv

0+阅读 · 2023年7月19日

A Comprehensive Overview of Large Language Models

Arxiv

21+阅读 · 2023年7月12日

A Survey on Large Language Models for Recommendation

Arxiv

12+阅读 · 2023年5月31日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Variational Knowledge Graph Reasoning

Arxiv

15+阅读 · 2018年4月5日

相关基金

介孔材料受限空间中的AGET ATRP和ARGET ATRP聚合反应

国家自然科学基金

0+阅读 · 2016年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

大气细颗粒物PM2.5对大鼠神经损伤机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

BDNF/TrkB途径介导调控骨髓瘤MDSCs破骨分化的作用和机制

国家自然科学基金

0+阅读 · 2013年12月31日

流程泵径向叶轮内空化流动机理及动力学特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

一株含双降解质粒的红球菌（Rhodococcus sp.）二噁英降解机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

白桦FT及SOC1基因的RNAi研究

国家自然科学基金

0+阅读 · 2009年12月31日

RIP140在神经元和神经胶质细胞增殖中的作用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员