Transformers as Algorithms: Generalization and Stability in In-context Learning - 专知论文

会员服务 ·

0

Learning · 泛化理论 · 变换 · Transformer模型 · Prompt ·

2023 年 2 月 6 日

Transformers as Algorithms: Generalization and Stability in In-context Learning

翻译：《Transformer即算法：上下文学习中的泛化性与稳定性》

Yingcong Li,M. Emrullah Ildiz,Dimitris Papailiopoulos,Samet Oymak

from arxiv, Revised version significantly improves the stability guarantees and provides new experiments

In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer. We characterize when transformer/attention architecture provably obeys the stability condition and also provide empirical verification. For generalization on unseen tasks, we identify an inductive bias phenomenon in which the transfer learning risk is governed by the task complexity and the number of MTL tasks in a highly predictable manner. Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions.

翻译：上下文学习（ICL）是一种提示方式，其中Transformer模型通过处理一系列（输入，输出）示例并实时执行推理。本文中，我们形式化地将上下文学习定义为算法学习问题，其中Transformer模型在推理阶段隐式构建假设函数。首先通过多任务学习的视角探讨该抽象模型的统计特性：当输入提示为（1）独立同分布（i.i.d.）的（输入，标签）序列或（2）动力系统产生的轨迹时，我们推导了ICL的泛化界。我们分析的核心是将超额风险与Transformer所实现算法的稳定性相关联。我们刻画了Transformer/注意力架构何时可证明地满足稳定性条件，并提供实证验证。针对未见任务的泛化，我们识别出归纳偏置现象：迁移学习风险以高度可预测的方式受任务复杂度和多任务学习任务数量的影响。最后，我们通过数值评估（1）证明Transformer在经典回归问题中（含独立同分布和动态数据）确实能实现近最优算法，（2）提供关于稳定性的洞见，（3）验证我们的理论预测。

0

相关内容

Learning

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

基于纳米发电机的自驱动MEMS/NEMS机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于两相流的膜吸收法海水烟气脱硫强化集成及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

烟气从天然气水合物中置换甲烷机理及三维实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

锰铁酸钇基多铁性陶瓷的微观结构和电、磁性能

国家自然科学基金

0+阅读 · 2012年12月31日

中高温烟气余热动力循环优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁化靶聚变中反场构型的气体放电特性与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

强湍流、机械振动及闪急沸腾诱导的射流破碎机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

用外显子组捕获测序技术鉴定Olmsted型掌跖角化症的致病基因

国家自然科学基金

0+阅读 · 2011年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

A Compositional Approach to Certifying the Almost Global Asymptotic Stability of Cascade Systems

Arxiv

0+阅读 · 2023年3月27日

Using Graph Algorithms to Pretrain Graph Completion Transformers

Arxiv

0+阅读 · 2023年3月27日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

The Principles of Deep Learning Theory

Arxiv

66+阅读 · 2021年6月18日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

VIP会员

文章信息

相关主题

Transformer模型

最新内容

《越野作战环境下路径规划的多准则整数规划模型》

《越野作战环境下路径规划的多准则整数规划模型》

专知会员服务

4+阅读 · 今天8:06

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

专知会员服务

3+阅读 · 今天8:00

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

专知会员服务

3+阅读 · 今天7:53

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

专知会员服务

6+阅读 · 今天7:49

《同步多无人机系统中的故障与通信》

《同步多无人机系统中的故障与通信》

专知会员服务

2+阅读 · 今天6:23

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

2+阅读 · 7月28日

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

7+阅读 · 7月28日

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

7+阅读 · 7月28日

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

8+阅读 · 7月28日

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

8+阅读 · 7月28日

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

9+阅读 · 7月28日

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

5+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

10+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

14+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

9+阅读 · 7月27日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能大语言模型引擎如何重塑全球冲突信息环境最新50页

“对标ChatGPT”：乌军研发Marichka AI系统用于战场筹划

《越野作战环境下路径规划的多准则整数规划模型》

《防空系统对自主武器系统辩论中“有意义的人类控制”的启示》70页报告

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

A Compositional Approach to Certifying the Almost Global Asymptotic Stability of Cascade Systems

Arxiv

0+阅读 · 2023年3月27日

Using Graph Algorithms to Pretrain Graph Completion Transformers

Arxiv

0+阅读 · 2023年3月27日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

The Principles of Deep Learning Theory

Arxiv

66+阅读 · 2021年6月18日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

vGraph: A Generative Model for Joint Community Detection and Node Representation Learning

Arxiv

14+阅读 · 2019年9月17日

相关基金

基于纳米发电机的自驱动MEMS/NEMS机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于两相流的膜吸收法海水烟气脱硫强化集成及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

烟气从天然气水合物中置换甲烷机理及三维实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

锰铁酸钇基多铁性陶瓷的微观结构和电、磁性能

国家自然科学基金

0+阅读 · 2012年12月31日

中高温烟气余热动力循环优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁化靶聚变中反场构型的气体放电特性与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

强湍流、机械振动及闪急沸腾诱导的射流破碎机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

用外显子组捕获测序技术鉴定Olmsted型掌跖角化症的致病基因

国家自然科学基金

0+阅读 · 2011年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员