Structured Recurrent Mixers for Massively Parallelized Sequence Generation - 专知论文

会员服务 ·

0

推断 · MoDELS · 表示 · Processing（编程语言） · 缩放 ·

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

翻译：暂无翻译

Benjamin L. Badger

Over the last two decades, language modeling has experienced a shift from the use of predominantly recurrent architectures that process tokens sequentially during training and inference to non-recurrent models that process sequence elements in parallel during training, which results in greater training efficiency and stability at the expense of lower inference throughput. Here we introduce the Structured Recurrent Mixer, an architecture that allows for algebraic conversion between a sequence parallel representation at train time and a recurrent representation at inference, notably without the need for specialized kernels or device-specific memory management. We show experimentally that this dual representation allows for greater training efficiency, higher input information capacity, and larger inference throughput and concurrency when compared to other linear complexity models. We postulate that recurrent models are poorly suited to extended sequence length scaling for information-rich inputs typical of language, but are well suited to scaling in the sample (batch) dimension due to their constant memory per sample. We provide Mojo/MAX inference implementations of SRMs exhibiting 12x the throughput and 170x the concurrency of similarly powerful Transformers inferenced on vLLM, increases characteristic of Pytorch implementations resulting in a 30\% increase in compute-constant GSM8k Pass@k. We conclude by demonstrating that SRMs are effective reinforcement learning training candidates.

翻译：暂无翻译

0

相关内容

【阿姆斯特丹博士论文】在语言模型中寻找结构

【阿姆斯特丹博士论文】在语言模型中寻找结构

专知会员服务

26+阅读 · 2024年11月27日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【ICLR2022】时序对齐预测的监督表示学习与少样本序列分类

【ICLR2022】时序对齐预测的监督表示学习与少样本序列分类

专知会员服务

21+阅读 · 2022年2月5日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

专知

36+阅读 · 2022年2月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

CVPR 2018 | 伯克利等提出无监督特征学习新方法，代码已开源

CVPR 2018 | 伯克利等提出无监督特征学习新方法，代码已开源

AI前线

12+阅读 · 2018年5月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

语料库构建——自然语言理解的基础

语料库构建——自然语言理解的基础

计算机研究与发展

11+阅读 · 2017年8月21日

考虑压拱和悬链线效应的RC框架结构反应修正系数研究

国家自然科学基金

0+阅读 · 2015年12月31日

RC框架-框桁式复合墙混合抗侧力体系抗震性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

RC框架-阶梯墙新型抗震结构关键问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

具有重构特征的系统可靠性建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向混凝土梁桥结构状态评估的非线性有限元模型修正研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于分层图结构化稀疏低秩表示的目标联合分割方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

考虑非线性孔隙流的近海结构物海床地基内波致渐进液化研究

国家自然科学基金

0+阅读 · 2014年12月31日

一种全新的结构修改重分析方法及其应用

国家自然科学基金

0+阅读 · 2014年12月31日

不确定结构可靠寿命设计的时变高精度模型和序列优化问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

Social Structure Matters in 3D Human-Human Interaction Generation

Arxiv

0+阅读 · 6月23日

Diffusion Models Adapt to Low-Dimensional Structure Under Flexible Coefficient Choices

Arxiv

0+阅读 · 6月22日

Numerical stability analysis of large language models

Arxiv

0+阅读 · 6月22日

Measuring Behavior Portability in Large Language Models

Arxiv

0+阅读 · 6月22日

Trajectory Forcing: Structure-First Generation with Controllable Semantic Trajectories

Arxiv

0+阅读 · 6月21日

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Arxiv

0+阅读 · 6月18日

Locality in Residuated-Lattice Structures

Arxiv

0+阅读 · 6月17日

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

Arxiv

0+阅读 · 6月17日

Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

Arxiv

0+阅读 · 6月17日

Approximate Structured Diffusion for Sequence Labelling

Arxiv

0+阅读 · 6月17日

VIP会员

文章信息

相关主题

Processing（编程语言）

最新内容

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

5+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

3+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

3+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

5+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

4+阅读 · 6月23日

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

6+阅读 · 6月22日

相关VIP内容

【阿姆斯特丹博士论文】在语言模型中寻找结构

【阿姆斯特丹博士论文】在语言模型中寻找结构

专知会员服务

26+阅读 · 2024年11月27日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【ICLR2022】时序对齐预测的监督表示学习与少样本序列分类

【ICLR2022】时序对齐预测的监督表示学习与少样本序列分类

专知会员服务

21+阅读 · 2022年2月5日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

重新思考无人机时代的生存能力

在人工智能加速决策环境中拓展OODA循环

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

装甲突击旅：现代战争思考、战斗与组织

相关资讯

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

专知

36+阅读 · 2022年2月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

CVPR 2018 | 伯克利等提出无监督特征学习新方法，代码已开源

CVPR 2018 | 伯克利等提出无监督特征学习新方法，代码已开源

AI前线

12+阅读 · 2018年5月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

语料库构建——自然语言理解的基础

语料库构建——自然语言理解的基础

计算机研究与发展

11+阅读 · 2017年8月21日

相关论文

Social Structure Matters in 3D Human-Human Interaction Generation

Arxiv

0+阅读 · 6月23日

Diffusion Models Adapt to Low-Dimensional Structure Under Flexible Coefficient Choices

Arxiv

0+阅读 · 6月22日

Numerical stability analysis of large language models

Arxiv

0+阅读 · 6月22日

Measuring Behavior Portability in Large Language Models

Arxiv

0+阅读 · 6月22日

Trajectory Forcing: Structure-First Generation with Controllable Semantic Trajectories

Arxiv

0+阅读 · 6月21日

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Arxiv

0+阅读 · 6月18日

Locality in Residuated-Lattice Structures

Arxiv

0+阅读 · 6月17日

Diffusion-Proof: Recipe for Formal Theorem Proving Beyond Auto-Regressive Generation

Arxiv

0+阅读 · 6月17日

Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

Arxiv

0+阅读 · 6月17日

Approximate Structured Diffusion for Sequence Labelling

Arxiv

0+阅读 · 6月17日

相关基金

考虑压拱和悬链线效应的RC框架结构反应修正系数研究

国家自然科学基金

0+阅读 · 2015年12月31日

RC框架-框桁式复合墙混合抗侧力体系抗震性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

RC框架-阶梯墙新型抗震结构关键问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

具有重构特征的系统可靠性建模方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向混凝土梁桥结构状态评估的非线性有限元模型修正研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于分层图结构化稀疏低秩表示的目标联合分割方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

考虑非线性孔隙流的近海结构物海床地基内波致渐进液化研究

国家自然科学基金

0+阅读 · 2014年12月31日

一种全新的结构修改重分析方法及其应用

国家自然科学基金

0+阅读 · 2014年12月31日

不确定结构可靠寿命设计的时变高精度模型和序列优化问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员