Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers - 专知论文

会员服务 ·

0

上下文 · 最优 · 非参数 · 变换 · 预训练 ·

Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers

翻译：高效且极小极大最优的上下文非参数回归Transformer

Michelle Ching,Ioana Popescu,Nico Smith,Tianyi Ma,William G. Underwood,Richard J. Samworth

from arxiv, 31 pages, 6 figures

We study in-context learning for nonparametric regression with $α$-Hölder smooth regression functions, for some $α>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Θ(\log n)$ parameters and $Ω\bigl(n^{2α/(2α+d)}\log^3 n\bigr)$ pretraining sequences can achieve the minimax-optimal rate of convergence $O\bigl(n^{-2α/(2α+d)}\bigr)$ in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers are able to approximate local polynomial estimators efficiently by implementing a kernel-weighted polynomial basis and then running gradient descent.

翻译：我们研究了具有$α$-Hölder光滑回归函数的非参数回归的上下文学习，其中$α>0$。我们证明，在给定$n$个上下文示例和$d$维回归协变量的情况下，一个具有$Θ(\log n)$参数且经过$Ω\bigl(n^{2α/(2α+d)}\log^3 n\bigr)$条预训练序列预训练的Transformer，能够达到均方误差的极小化最优收敛速率$O\bigl(n^{-2α/(2α+d)}\bigr)$。我们的结果所需的Transformer参数和预训练序列数量远少于文献中的先前结果。这是通过证明Transformer能够通过实现核加权多项式基并运行梯度下降，从而高效地逼近局部多项式估计器来实现的。

0

相关内容

上下文

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

专知会员服务

21+阅读 · 2025年1月15日

McGill大学等最新《不确定性决策下的上下文优化方法》综述

McGill大学等最新《不确定性决策下的上下文优化方法》综述

专知会员服务

33+阅读 · 2023年6月25日

用Transformer学习通用超参数优化器，DeepMind Yutian Chen博士讲授，附Slides与视频

用Transformer学习通用超参数优化器，DeepMind Yutian Chen博士讲授，附Slides与视频

专知会员服务

40+阅读 · 2023年3月12日

【Google】高效Transformer综述，Efficient Transformers: A Survey

【Google】高效Transformer综述，Efficient Transformers: A Survey

专知会员服务

66+阅读 · 2022年3月17日

【Google】最新《高效Transformers》综述大全，Efficient Transformers: A Survey

【Google】最新《高效Transformers》综述大全，Efficient Transformers: A Survey

专知会员服务

113+阅读 · 2020年9月17日

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

专知会员服务

51+阅读 · 2020年3月17日

【中科大】上下文感知推荐系统的图卷积机：Graph Convolution Machine for Context-aware Recommender System

【中科大】上下文感知推荐系统的图卷积机：Graph Convolution Machine for Context-aware Recommender System

专知会员服务

71+阅读 · 2020年2月5日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

300+篇文献！一文详解基于Transformer的多模态学习最新进展

300+篇文献！一文详解基于Transformer的多模态学习最新进展

PaperWeekly

13+阅读 · 2022年7月1日

Dropout、梯度消失/爆炸、Adam优化算法，神经网络优化算法看这一篇就够了

Dropout、梯度消失/爆炸、Adam优化算法，神经网络优化算法看这一篇就够了

AI100

14+阅读 · 2019年9月1日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

最新36页《贝叶斯非参学习综述》，机器学习内功修炼手册

最新36页《贝叶斯非参学习综述》，机器学习内功修炼手册

专知

25+阅读 · 2019年2月27日

Transformer-XL：释放注意力模型的潜力

Transformer-XL：释放注意力模型的潜力

谷歌开发者

31+阅读 · 2019年2月19日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

数据分析师应该知道的16种回归技术：偏最小二乘回归

数据分析师应该知道的16种回归技术：偏最小二乘回归

数萃大数据

14+阅读 · 2018年8月29日

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

新智元

12+阅读 · 2018年7月13日

变分自编码器（Variational Autoencoder, VAE）通俗教程，细节、基础、符号解释很齐全

变分自编码器（Variational Autoencoder, VAE）通俗教程，细节、基础、符号解释很齐全

CreateAMind

12+阅读 · 2018年4月7日

【干货】机器学习中的五种回归模型及其优缺点

【干货】机器学习中的五种回归模型及其优缺点

专知

21+阅读 · 2018年3月29日

几类非线性微分方程的变分和拓扑方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

带变动指标集的非光滑半无限优化问题的最优性条件研究

国家自然科学基金

0+阅读 · 2015年12月31日

非光滑非凸优化问题的交替线性化算法及其应用

国家自然科学基金

6+阅读 · 2015年12月31日

函数数据变换模型及降维方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

非局部Schrödinger方程的高效守恒算法

国家自然科学基金

0+阅读 · 2015年12月31日

量子齐次空间上同调的非交换Hodge分解及形变意义

国家自然科学基金

0+阅读 · 2015年12月31日

大型稀疏非对称线性方程组的归纳降维算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

非局部总变差正则化图像恢复模型的快速子空间校正算法

国家自然科学基金

0+阅读 · 2014年12月31日

基于结构学习的非平行支持向量机最优化方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

分数阶偏微分方程与近场动力学等非局部模型的高保真快速算法与数值分析

国家自然科学基金

1+阅读 · 2014年12月31日

Text Style Transfer with Parameter-efficient LLM Finetuning and Round-trip Translation

Arxiv

0+阅读 · 2月16日

Information-theoretic minimax and submodular optimization algorithms for multivariate Markov chains

Arxiv

0+阅读 · 2月16日

Linear Regression with Unknown Truncation Beyond Gaussian Features

Arxiv

0+阅读 · 2月13日

In-Context Learning Without Copying

Arxiv

0+阅读 · 2月10日

The M-Tensor Format: Optimality in High Dimensional Regression for Nonlinear Models with Scarce Data

Arxiv

0+阅读 · 2月9日

ZeroS: Zero-Sum Linear Attention for Efficient Transformers

Arxiv

0+阅读 · 2月5日

Conditional regression for the Nonlinear Single-Variable Model

Arxiv

0+阅读 · 2月4日

Transformers as Measure-Theoretic Associative Memory: A Statistical Perspective and Minimax Optimality

Arxiv

0+阅读 · 2月2日

Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers

Arxiv

0+阅读 · 1月28日

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

Arxiv

0+阅读 · 1月17日

VIP会员

文章信息

相关主题

最新内容

2026“人工智能+”行业发展蓝皮书（附下载）

2026“人工智能+”行业发展蓝皮书（附下载）

专知会员服务

7+阅读 · 今天12:11

《强化学习数学基础》

《强化学习数学基础》

专知会员服务

4+阅读 · 今天12:07

何为下一代指挥与控制？美陆军选择第四步兵师进行快速原型NGC2开发

何为下一代指挥与控制？美陆军选择第四步兵师进行快速原型NGC2开发

专知会员服务

6+阅读 · 今天10:06

《低成本自杀式无人机战争的军事战略影响：以乌克兰和伊朗为案例研究》

《低成本自杀式无人机战争的军事战略影响：以乌克兰和伊朗为案例研究》

专知会员服务

3+阅读 · 今天9:11

深入Maven智能系统：Palantir基于Claude打造的军事大脑

深入Maven智能系统：Palantir基于Claude打造的军事大脑

专知会员服务

10+阅读 · 今天8:18

“Maven计划”的发展演变之“Maven智能系统”应用

“Maven计划”的发展演变之“Maven智能系统”应用

专知会员服务

9+阅读 · 今天8:03

伊朗的无人机蜂群策略如何挑战美国防御系统：人工智能驱动的无人机战争与现代冲突的转型

伊朗的无人机蜂群策略如何挑战美国防御系统：人工智能驱动的无人机战争与现代冲突的转型

专知会员服务

6+阅读 · 今天7:39

《将小型无人机系统与巡飞弹集成至连及以下级别战术机动》（美陆军最新报告中文版）

《将小型无人机系统与巡飞弹集成至连及以下级别战术机动》（美陆军最新报告中文版）

专知会员服务

5+阅读 · 今天6:58

加拿大国防部发布项目需求：用于高级态势决策的多模态人工智能

加拿大国防部发布项目需求：用于高级态势决策的多模态人工智能

专知会员服务

5+阅读 · 今天6:54

《无人机革命：来自俄乌战场的启示》（报告）

《无人机革命：来自俄乌战场的启示》（报告）

专知会员服务

9+阅读 · 今天6:48

《实现联合作战能力所需的技术》58页报告

《实现联合作战能力所需的技术》58页报告

专知会员服务

5+阅读 · 今天6:30

《算法化目标定位：人工智能在以色列加沙打击行动中的作用及其伦理影响》（中文版）

《算法化目标定位：人工智能在以色列加沙打击行动中的作用及其伦理影响》（中文版）

专知会员服务

7+阅读 · 今天6:22

以色列运用人工智能优化空袭警报系统

以色列运用人工智能优化空袭警报系统

专知会员服务

5+阅读 · 今天6:20

以色列在多条战线部署AI智能体

以色列在多条战线部署AI智能体

专知会员服务

7+阅读 · 今天6:12

《将形式化方法工具应用于电子战代码库（经验报告）》

《将形式化方法工具应用于电子战代码库（经验报告）》

专知会员服务

6+阅读 · 今天6:09

相关VIP内容

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

专知会员服务

21+阅读 · 2025年1月15日

McGill大学等最新《不确定性决策下的上下文优化方法》综述

McGill大学等最新《不确定性决策下的上下文优化方法》综述

专知会员服务

33+阅读 · 2023年6月25日

用Transformer学习通用超参数优化器，DeepMind Yutian Chen博士讲授，附Slides与视频

用Transformer学习通用超参数优化器，DeepMind Yutian Chen博士讲授，附Slides与视频

专知会员服务

40+阅读 · 2023年3月12日

【Google】高效Transformer综述，Efficient Transformers: A Survey

【Google】高效Transformer综述，Efficient Transformers: A Survey

专知会员服务

66+阅读 · 2022年3月17日

【Google】最新《高效Transformers》综述大全，Efficient Transformers: A Survey

【Google】最新《高效Transformers》综述大全，Efficient Transformers: A Survey

专知会员服务

113+阅读 · 2020年9月17日

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

专知会员服务

51+阅读 · 2020年3月17日

【中科大】上下文感知推荐系统的图卷积机：Graph Convolution Machine for Context-aware Recommender System

【中科大】上下文感知推荐系统的图卷积机：Graph Convolution Machine for Context-aware Recommender System

专知会员服务

71+阅读 · 2020年2月5日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《强化学习数学基础》

《低成本自杀式无人机战争的军事战略影响：以乌克兰和伊朗为案例研究》

2026“人工智能+”行业发展蓝皮书（附下载）

何为下一代指挥与控制？美陆军选择第四步兵师进行快速原型NGC2开发

相关资讯

300+篇文献！一文详解基于Transformer的多模态学习最新进展

300+篇文献！一文详解基于Transformer的多模态学习最新进展

PaperWeekly

13+阅读 · 2022年7月1日

Dropout、梯度消失/爆炸、Adam优化算法，神经网络优化算法看这一篇就够了

Dropout、梯度消失/爆炸、Adam优化算法，神经网络优化算法看这一篇就够了

AI100

14+阅读 · 2019年9月1日

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

《小样本学习(Few-shot learning)》最新41页综述论文，来自港科大和第四范式

专知

363+阅读 · 2019年4月12日

最新36页《贝叶斯非参学习综述》，机器学习内功修炼手册

最新36页《贝叶斯非参学习综述》，机器学习内功修炼手册

专知

25+阅读 · 2019年2月27日

Transformer-XL：释放注意力模型的潜力

Transformer-XL：释放注意力模型的潜力

谷歌开发者

31+阅读 · 2019年2月19日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

数据分析师应该知道的16种回归技术：偏最小二乘回归

数据分析师应该知道的16种回归技术：偏最小二乘回归

数萃大数据

14+阅读 · 2018年8月29日

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

DeepMind无监督表示学习重大突破：语音、图像、文本、强化学习全能冠军！

新智元

12+阅读 · 2018年7月13日

变分自编码器（Variational Autoencoder, VAE）通俗教程，细节、基础、符号解释很齐全

变分自编码器（Variational Autoencoder, VAE）通俗教程，细节、基础、符号解释很齐全

CreateAMind

12+阅读 · 2018年4月7日

【干货】机器学习中的五种回归模型及其优缺点

【干货】机器学习中的五种回归模型及其优缺点

专知

21+阅读 · 2018年3月29日

相关论文

Text Style Transfer with Parameter-efficient LLM Finetuning and Round-trip Translation

Arxiv

0+阅读 · 2月16日

Information-theoretic minimax and submodular optimization algorithms for multivariate Markov chains

Arxiv

0+阅读 · 2月16日

Linear Regression with Unknown Truncation Beyond Gaussian Features

Arxiv

0+阅读 · 2月13日

In-Context Learning Without Copying

Arxiv

0+阅读 · 2月10日

The M-Tensor Format: Optimality in High Dimensional Regression for Nonlinear Models with Scarce Data

Arxiv

0+阅读 · 2月9日

ZeroS: Zero-Sum Linear Attention for Efficient Transformers

Arxiv

0+阅读 · 2月5日

Conditional regression for the Nonlinear Single-Variable Model

Arxiv

0+阅读 · 2月4日

Transformers as Measure-Theoretic Associative Memory: A Statistical Perspective and Minimax Optimality

Arxiv

0+阅读 · 2月2日

Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers

Arxiv

0+阅读 · 1月28日

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

Arxiv

0+阅读 · 1月17日

相关基金

几类非线性微分方程的变分和拓扑方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

带变动指标集的非光滑半无限优化问题的最优性条件研究

国家自然科学基金

0+阅读 · 2015年12月31日

非光滑非凸优化问题的交替线性化算法及其应用

国家自然科学基金

6+阅读 · 2015年12月31日

函数数据变换模型及降维方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

非局部Schrödinger方程的高效守恒算法

国家自然科学基金

0+阅读 · 2015年12月31日

量子齐次空间上同调的非交换Hodge分解及形变意义

国家自然科学基金

0+阅读 · 2015年12月31日

大型稀疏非对称线性方程组的归纳降维算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

非局部总变差正则化图像恢复模型的快速子空间校正算法

国家自然科学基金

0+阅读 · 2014年12月31日

基于结构学习的非平行支持向量机最优化方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

分数阶偏微分方程与近场动力学等非局部模型的高保真快速算法与数值分析

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员