切片ReLU注意力：通过排序实现准线性上下文表达能力 (Sliced ReLU attention: Quasi-linear contextual expressivity via sorting) - 专知论文

会员服务 ·

0

ReLU · 上下文 · 排序 · 准线性 · 结构 ·

2025 年 12 月 12 日

Sliced ReLU attention: Quasi-linear contextual expressivity via sorting

翻译：切片ReLU注意力：通过排序实现准线性上下文表达能力

Siwan Boufadène,François-Xavier Vialard

We introduce sliced ReLU attention, a new attention mechanism that departs structurally from both softmax and ReLU-based alternatives. Instead of applying a nonlinearity to pairwise dot products, we operate on one-dimensional projections of key--query differences and leverage sorting to obtain quasi-linear complexity. This construction yields a differentiable, non-symmetric kernel that can be computed in O(n log(n)) through a sorting procedure, making it suitable for very long contexts. Beyond computational benefits, the model retains strong theoretical expressive power: we establish two in-context expressivity results, previously known for softmax attention, showing that sliced ReLU attention preserves the ability to perform nontrivial sequence-to-sequence disentangling tasks and satisfies a contextual universal approximation property. Finally, we illustrate the potential practical interest of this kernel in small-scale experiments.

翻译：我们提出切片ReLU注意力，这是一种新型注意力机制，在结构上区别于传统的softmax和基于ReLU的替代方案。该方法不直接对成对点积应用非线性变换，而是对键-查询差值的一维投影进行操作，并利用排序算法实现准线性复杂度。该结构产生一个可微、非对称的核函数，可通过排序过程以O(n log(n))复杂度计算，适用于极长上下文场景。除计算优势外，该模型保持强大的理论表达能力：我们建立了两个在上下文表达能力方面的结果（此前仅见于softmax注意力），证明切片ReLU注意力能够执行非平凡的序列到序列解耦任务，并满足上下文通用逼近性质。最后，我们通过小规模实验展示了该核函数的潜在应用价值。

0

相关内容

ReLU

【CVPR2024】学习视觉Transformer的相关结构

【CVPR2024】学习视觉Transformer的相关结构

专知会员服务

27+阅读 · 2024年4月8日

【ICML2023】MetaModulation: 用更少任务进行小样本学习的变分特征层次结构学习

【ICML2023】MetaModulation: 用更少任务进行小样本学习的变分特征层次结构学习

专知会员服务

35+阅读 · 2023年5月22日

【NeurIPS2022】VICRegL:局部视觉特征的自监督学习

【NeurIPS2022】VICRegL:局部视觉特征的自监督学习

专知会员服务

32+阅读 · 2022年10月6日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【ICML2021】基于低秩重参数化的大规模私有学习

专知会员服务

12+阅读 · 2021年6月20日

MonoGRNet：单目3D目标检测的通用框架（TPAMI2021）

MonoGRNet：单目3D目标检测的通用框架（TPAMI2021）

专知会员服务

18+阅读 · 2021年5月3日

【AAAI2021】近似梯度下降的学习图神经网络

专知会员服务

20+阅读 · 2020年12月9日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知会员服务

28+阅读 · 2020年4月1日

【ICLR2020-MIT】元学习的好奇心算法，Meta-learning curiosity algorithms

【ICLR2020-MIT】元学习的好奇心算法，Meta-learning curiosity algorithms

专知会员服务

34+阅读 · 2020年3月13日

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

图节点嵌入(Node Embeddings)概述，9页pdf

图节点嵌入(Node Embeddings)概述，9页pdf

专知

15+阅读 · 2020年8月22日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

Spark机器学习：矩阵及推荐算法

Spark机器学习：矩阵及推荐算法

LibRec智能推荐

16+阅读 · 2017年8月3日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于量子随机行走智能处理的理论和方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

非线性系统输入状态稳定性分析与设计的不定向量Lyapunov函数导数方法

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于分层稀疏表示的微动目标ISAR三维层析成像技术

国家自然科学基金

1+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

提高移动最小二乘近似无网格方法计算效率的技术和理论

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

109+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

87+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Data-centric Artificial Intelligence: A Survey

Arxiv

27+阅读 · 2023年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2024】学习视觉Transformer的相关结构

【CVPR2024】学习视觉Transformer的相关结构

专知会员服务

27+阅读 · 2024年4月8日

【ICML2023】MetaModulation: 用更少任务进行小样本学习的变分特征层次结构学习

【ICML2023】MetaModulation: 用更少任务进行小样本学习的变分特征层次结构学习

专知会员服务

35+阅读 · 2023年5月22日

【NeurIPS2022】VICRegL:局部视觉特征的自监督学习

【NeurIPS2022】VICRegL:局部视觉特征的自监督学习

专知会员服务

32+阅读 · 2022年10月6日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

【ICML2021】基于低秩重参数化的大规模私有学习

专知会员服务

12+阅读 · 2021年6月20日

MonoGRNet：单目3D目标检测的通用框架（TPAMI2021）

MonoGRNet：单目3D目标检测的通用框架（TPAMI2021）

专知会员服务

18+阅读 · 2021年5月3日

【AAAI2021】近似梯度下降的学习图神经网络

专知会员服务

20+阅读 · 2020年12月9日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知会员服务

28+阅读 · 2020年4月1日

【ICLR2020-MIT】元学习的好奇心算法，Meta-learning curiosity algorithms

【ICLR2020-MIT】元学习的好奇心算法，Meta-learning curiosity algorithms

专知会员服务

34+阅读 · 2020年3月13日

热门VIP内容

开通专知VIP会员享更多权益服务

论学习、公平性与复杂度

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

2025中国人工智能学会系列白皮书⸺棋盘上的人工智能|附下载

通用智能体评估的逻辑架构

相关资讯

AAAI 2022 | ProtGNN：自解释图神经网络

AAAI 2022 | ProtGNN：自解释图神经网络

专知

10+阅读 · 2022年2月28日

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

【CVPR2021】CausalVAE: 引入因果结构的解耦表示学习

专知

19+阅读 · 2021年3月28日

图节点嵌入(Node Embeddings)概述，9页pdf

图节点嵌入(Node Embeddings)概述，9页pdf

专知

15+阅读 · 2020年8月22日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

Spark机器学习：矩阵及推荐算法

Spark机器学习：矩阵及推荐算法

LibRec智能推荐

16+阅读 · 2017年8月3日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

109+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

231+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

87+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

499+阅读 · 2023年3月31日

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Arxiv

154+阅读 · 2023年3月29日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Nature Language Reasoning, A Survey

Arxiv

83+阅读 · 2023年3月26日

Data-centric Artificial Intelligence: A Survey

Arxiv

27+阅读 · 2023年3月17日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于量子随机行走智能处理的理论和方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

非线性系统输入状态稳定性分析与设计的不定向量Lyapunov函数导数方法

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于分层稀疏表示的微动目标ISAR三维层析成像技术

国家自然科学基金

1+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

提高移动最小二乘近似无网格方法计算效率的技术和理论

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员