The Expressive Power of Transformers with Chain of Thought - 专知论文

会员服务 ·

0

变换 · 解码 · 线性的 · Transformer · 可辨认的 ·

2024 年 4 月 11 日

The Expressive Power of Transformers with Chain of Thought

翻译：链式思维下Transformer的表达能力

William Merrill,Ashish Sabharwal

from arxiv, 9-page preprint. ICLR camera ready posted April 11

Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after reading their input. However, in practice, transformers' reasoning can be improved by allowing them to use a "chain of thought" or "scratchpad", i.e., generate and condition on a sequence of intermediate tokens before answering. Motivated by this, we ask: Does such intermediate generation fundamentally extend the computational power of a decoder-only transformer? We show that the answer is yes, but the amount of increase depends crucially on the amount of intermediate generation. For instance, we find that transformer decoders with a logarithmic number of decoding steps (w.r.t. the input length) push the limits of standard transformers only slightly, while a linear number of decoding steps, assuming projected pre-norm (a slight generalization of standard pre-norm), adds a clear new ability (under standard complexity conjectures): recognizing all regular languages. Our results also imply that linear steps keep transformer decoders within context-sensitive languages, and polynomial steps with generalized pre-norm make them recognize exactly the class of polynomial-time solvable problems -- the first exact characterization of a type of transformers in terms of standard complexity classes. Together, this provides a nuanced framework for understanding how the length of a transformer's chain of thought or scratchpad impacts its reasoning power.

翻译：近期理论工作发现了一些看似简单的推理问题（例如判断图两节点是否连通或模拟有限状态机），这些问题的确无法通过标准Transformer在读取输入后立即作答来解决。然而在实践中，允许Transformer使用"链式思维"或"草稿板"（即生成并基于中间token序列进行推理）可提升其推理能力。受此启发，我们提出疑问：这种中间生成过程是否能从根本上扩展仅解码器架构Transformer的计算能力？我们的回答是肯定的，但能力提升幅度高度依赖中间生成步骤的长度。例如，我们发现对数级解码步长（相对于输入长度）仅轻微突破标准Transformer的极限；而在线性级解码步长下（采用投影预归一化——标准预归一化的轻微泛化），根据标准复杂性猜想，Transformer将获得明确的新能力：可识别所有正则语言。进一步结果表明，线性步长使Transformer解码器保持在上下文相关语言范围内，而广义预归一化下的多项式步长则使其能精确识别多项式时间可解问题类——这是首次以标准复杂性类对某类Transformer进行精确刻画。这些发现共同构建了理解Transformer链式思维或草稿板长度如何影响其推理能力的精细框架。

0

相关内容

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

Learning About Structural Errors in Models of Complex Dynamical Systems

Arxiv

0+阅读 · 2024年5月28日

A Method for Auto-Differentiation of the Voronoi Tessellation

Arxiv

0+阅读 · 2024年5月27日

Learning Minimum Linear Arrangement of Cliques and Lines

Arxiv

0+阅读 · 2024年5月24日

A Logic of Knowledge and Justifications, with an Application to Computational Trust

Arxiv

0+阅读 · 2024年5月24日

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Arxiv

0+阅读 · 2024年5月24日

Bayesian Optimization of Functions over Node Subsets in Graphs

Arxiv

0+阅读 · 2024年5月24日

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Arxiv

13+阅读 · 2023年11月2日

Minimum Levels of Interpretability for Artificial Moral Agents

Arxiv

13+阅读 · 2023年7月2日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

VIP会员

文章信息

相关主题

最新内容

2026“人工智能+”行业发展蓝皮书（附下载）

2026“人工智能+”行业发展蓝皮书（附下载）

专知会员服务

9+阅读 · 4月26日

《强化学习数学基础》

《强化学习数学基础》

专知会员服务

5+阅读 · 4月26日

何为下一代指挥与控制？美陆军选择第四步兵师进行快速原型NGC2开发

何为下一代指挥与控制？美陆军选择第四步兵师进行快速原型NGC2开发

专知会员服务

7+阅读 · 4月26日

《低成本自杀式无人机战争的军事战略影响：以乌克兰和伊朗为案例研究》

《低成本自杀式无人机战争的军事战略影响：以乌克兰和伊朗为案例研究》

专知会员服务

5+阅读 · 4月26日

深入Maven智能系统：Palantir基于Claude打造的军事大脑

深入Maven智能系统：Palantir基于Claude打造的军事大脑

专知会员服务

12+阅读 · 4月26日

“Maven计划”的发展演变之“Maven智能系统”应用

“Maven计划”的发展演变之“Maven智能系统”应用

专知会员服务

10+阅读 · 4月26日

伊朗的无人机蜂群策略如何挑战美国防御系统：人工智能驱动的无人机战争与现代冲突的转型

伊朗的无人机蜂群策略如何挑战美国防御系统：人工智能驱动的无人机战争与现代冲突的转型

专知会员服务

7+阅读 · 4月26日

《将小型无人机系统与巡飞弹集成至连及以下级别战术机动》（美陆军最新报告中文版）

《将小型无人机系统与巡飞弹集成至连及以下级别战术机动》（美陆军最新报告中文版）

专知会员服务

7+阅读 · 4月26日

加拿大国防部发布项目需求：用于高级态势决策的多模态人工智能

加拿大国防部发布项目需求：用于高级态势决策的多模态人工智能

专知会员服务

6+阅读 · 4月26日

《无人机革命：来自俄乌战场的启示》（报告）

《无人机革命：来自俄乌战场的启示》（报告）

专知会员服务

9+阅读 · 4月26日

《实现联合作战能力所需的技术》58页报告

《实现联合作战能力所需的技术》58页报告

专知会员服务

7+阅读 · 4月26日

《算法化目标定位：人工智能在以色列加沙打击行动中的作用及其伦理影响》（中文版）

《算法化目标定位：人工智能在以色列加沙打击行动中的作用及其伦理影响》（中文版）

专知会员服务

8+阅读 · 4月26日

以色列运用人工智能优化空袭警报系统

以色列运用人工智能优化空袭警报系统

专知会员服务

5+阅读 · 4月26日

以色列在多条战线部署AI智能体

以色列在多条战线部署AI智能体

专知会员服务

7+阅读 · 4月26日

《将形式化方法工具应用于电子战代码库（经验报告）》

《将形式化方法工具应用于电子战代码库（经验报告）》

专知会员服务

7+阅读 · 4月26日

相关VIP内容

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《强化学习数学基础》

《低成本自杀式无人机战争的军事战略影响：以乌克兰和伊朗为案例研究》

2026“人工智能+”行业发展蓝皮书（附下载）

何为下一代指挥与控制？美陆军选择第四步兵师进行快速原型NGC2开发

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Learning About Structural Errors in Models of Complex Dynamical Systems

Arxiv

0+阅读 · 2024年5月28日

A Method for Auto-Differentiation of the Voronoi Tessellation

Arxiv

0+阅读 · 2024年5月27日

Learning Minimum Linear Arrangement of Cliques and Lines

Arxiv

0+阅读 · 2024年5月24日

A Logic of Knowledge and Justifications, with an Application to Computational Trust

Arxiv

0+阅读 · 2024年5月24日

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Arxiv

0+阅读 · 2024年5月24日

Bayesian Optimization of Functions over Node Subsets in Graphs

Arxiv

0+阅读 · 2024年5月24日

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Arxiv

13+阅读 · 2023年11月2日

Minimum Levels of Interpretability for Artificial Moral Agents

Arxiv

13+阅读 · 2023年7月2日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员