NextFlow：统一序列建模激活多模态理解与生成 (NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation) - 专知论文

会员服务 ·

0

模态 · 序列 · 多模 · 多模态理解 · 多模态 ·

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

翻译：NextFlow：统一序列建模激活多模态理解与生成

Huichao Zhang,Liao Qu,Yiheng Liu,Hang Chen,Yangyang Song,Yongsheng Dong,Shikun Sun,Xian Li,Xu Wang,Yi Jiang,Hu Ye,Bo Chen,Yiming Gao,Peng Liu,Akide Liu,Zhipeng Yang,Qili Deng,Linjie Xing,Jiyang Liu,Zhao Wang,Yang Zhou,Mingcong Liu,Yi Zhang,Qian He,Xiwei Hu,Zhongqi Qi,Jie Shao,Zhiye Fu,Shuai Wang,Fangmin Chen,Xuezhi Chai,Zhihua Wu,Yitong Wang,Zehuan Yuan,Daniel K. Du,Xinglong Wu

from arxiv, Project page: https://github.com/ByteVisionLab/NextFlow

We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a unified autoregressive architecture, NextFlow natively activates multimodal understanding and generation capabilities, unlocking abilities of image editing, interleaved content and video generation. Motivated by the distinct nature of modalities - where text is strictly sequential and images are inherently hierarchical - we retain next-token prediction for text but adopt next-scale prediction for visual generation. This departs from traditional raster-scan methods, enabling the generation of 1024x1024 images in just 5 seconds - orders of magnitude faster than comparable AR models. We address the instabilities of multi-scale generation through a robust training recipe. Furthermore, we introduce a prefix-tuning strategy for reinforcement learning. Experiments demonstrate that NextFlow achieves state-of-the-art performance among unified models and rivals specialized diffusion baselines in visual quality.

翻译：我们提出NextFlow，一个在6万亿交错文本-图像离散标记上训练的统一仅解码器自回归Transformer。通过在统一自回归架构中利用统一的视觉表示，NextFlow原生激活了多模态理解与生成能力，解锁了图像编辑、交错内容生成和视频生成等功能。受模态本质差异的启发——文本严格遵循序列性而图像本质上是层次化的——我们保留文本的下一标记预测，但对视觉生成采用下一尺度预测。这有别于传统的栅格扫描方法，使得生成1024x1024分辨率图像仅需5秒，比同类自回归模型快数个数量级。我们通过稳健的训练方案解决了多尺度生成的不稳定性问题。此外，我们引入了一种用于强化学习的前缀调优策略。实验表明，NextFlow在统一模型中实现了最先进的性能，并在视觉质量上可与专用扩散基线模型相媲美。

0

相关内容

DeepSeek R1方法成功迁移到视觉领域，多模态AI迎来新突破！

DeepSeek R1方法成功迁移到视觉领域，多模态AI迎来新突破！

专知会员服务

25+阅读 · 2025年2月21日

Meta-Transformer：多模态学习的统一框架

Meta-Transformer：多模态学习的统一框架

专知会员服务

59+阅读 · 2023年7月21日

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

TensorFlowLite:端侧机器学习框架

TensorFlowLite:端侧机器学习框架

专知会员服务

33+阅读 · 2020年8月27日

TensorFlow 2.2为keras.Model加入train_step方法，开发者可自由定义模型自动训练过程

TensorFlow 2.2为keras.Model加入train_step方法，开发者可自由定义模型自动训练过程

专知会员服务

36+阅读 · 2020年3月27日

微软亚洲研究院新论文-《多模态预训练语言模型UniViLM》面向多模态理解和生成的统一视频和语言预训练模型

微软亚洲研究院新论文-《多模态预训练语言模型UniViLM》面向多模态理解和生成的统一视频和语言预训练模型

专知会员服务

109+阅读 · 2020年2月19日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

TensorFlow Lite指南实战《TensorFlow Lite A primer》，附48页PPT

TensorFlow Lite指南实战《TensorFlow Lite A primer》，附48页PPT

专知会员服务

70+阅读 · 2020年1月17日

【电子书推荐】《深度学习之TensorFlow工程化项目实战》电子书以及配套代码及数据集资源，附787页pdf

【电子书推荐】《深度学习之TensorFlow工程化项目实战》电子书以及配套代码及数据集资源，附787页pdf

专知会员服务

211+阅读 · 2019年12月15日

TensorFlow 2.0中文开源书项目：日赞700，登上GitHub热榜

TensorFlow 2.0中文开源书项目：日赞700，登上GitHub热榜

机器之心

20+阅读 · 2019年11月17日

【Github热榜】龙龙老师《TensorFlow深度学习》新书，400页pdf

【Github热榜】龙龙老师《TensorFlow深度学习》新书，400页pdf

专知

36+阅读 · 2019年11月17日

这个项目火了！各种深度学习架构，模型和技巧的集合

这个项目火了！各种深度学习架构，模型和技巧的集合

大数据技术

14+阅读 · 2019年6月13日

GitHub趋势榜第一：TensorFlow+PyTorch深度学习资源大汇总

GitHub趋势榜第一：TensorFlow+PyTorch深度学习资源大汇总

新智元

19+阅读 · 2019年6月8日

Keras作者推荐的Github项目，基于TensorFlow2的生成式模型合集

Keras作者推荐的Github项目，基于TensorFlow2的生成式模型合集

专知

15+阅读 · 2019年5月17日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

54+阅读 · 2019年4月12日

NLP不同任务Tensorflow深度学习模型大全

NLP不同任务Tensorflow深度学习模型大全

专知

10+阅读 · 2019年3月19日

Tensorflow官方视频课程-深度学习工具 TensorFlow入门

Tensorflow官方视频课程-深度学习工具 TensorFlow入门

深度学习与NLP

12+阅读 · 2019年3月12日

【下载】最新TensorFlow深度学习教程指引《Learning TensorFlow，构建深度学习系统指引》

【下载】最新TensorFlow深度学习教程指引《Learning TensorFlow，构建深度学习系统指引》

专知

28+阅读 · 2017年12月6日

tensorflow系列笔记：流程，概念和代码解析

tensorflow系列笔记：流程，概念和代码解析

北京思腾合力科技有限公司

30+阅读 · 2017年11月11日

循环神经网络多模态深度模型联想记忆功能研究

国家自然科学基金

6+阅读 · 2017年12月31日

基于深度学习的复杂退化模糊图像恢复

国家自然科学基金

5+阅读 · 2015年12月31日

面向数万处理器的有限元线性方程组与模态多级算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

多尺度模块网络下的储备池神经计算模型及算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于极限学习单元的多生物特征图像深度学习建模与识别研究

国家自然科学基金

2+阅读 · 2015年12月31日

神经形态多核处理器的架构模型研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向大规模多步学习问题的学习分类元系统技术研究

国家自然科学基金

5+阅读 · 2015年12月31日

基于框架提升变换的多源图像融合研究

国家自然科学基金

1+阅读 · 2015年12月31日

半监督进化文本聚类算法在动态多源文本分析上的研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向多源遥感图像的深度学习技术与系统研究

国家自然科学基金

17+阅读 · 2014年12月31日

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Arxiv

0+阅读 · 2月4日

FragmentFlow: Scalable Transition State Generation for Large Molecules

Arxiv

0+阅读 · 2月2日

OneFlowSBI: One Model, Many Queries for Simulation-Based Inference

Arxiv

0+阅读 · 1月30日

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

Arxiv

0+阅读 · 1月30日

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Arxiv

0+阅读 · 1月28日

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

Arxiv

0+阅读 · 1月28日

Enginuity: Building an Open Multi-Domain Dataset of Complex Engineering Diagrams

Arxiv

0+阅读 · 1月19日

ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

Arxiv

0+阅读 · 1月18日

DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction

Arxiv

0+阅读 · 1月15日

CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation

Arxiv

0+阅读 · 1月7日

VIP会员

文章信息

相关主题

多模态理解

相关VIP内容

DeepSeek R1方法成功迁移到视觉领域，多模态AI迎来新突破！

DeepSeek R1方法成功迁移到视觉领域，多模态AI迎来新突破！

专知会员服务

25+阅读 · 2025年2月21日

Meta-Transformer：多模态学习的统一框架

Meta-Transformer：多模态学习的统一框架

专知会员服务

59+阅读 · 2023年7月21日

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

TensorFlowLite:端侧机器学习框架

TensorFlowLite:端侧机器学习框架

专知会员服务

33+阅读 · 2020年8月27日

TensorFlow 2.2为keras.Model加入train_step方法，开发者可自由定义模型自动训练过程

TensorFlow 2.2为keras.Model加入train_step方法，开发者可自由定义模型自动训练过程

专知会员服务

36+阅读 · 2020年3月27日

微软亚洲研究院新论文-《多模态预训练语言模型UniViLM》面向多模态理解和生成的统一视频和语言预训练模型

微软亚洲研究院新论文-《多模态预训练语言模型UniViLM》面向多模态理解和生成的统一视频和语言预训练模型

专知会员服务

109+阅读 · 2020年2月19日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

TensorFlow Lite指南实战《TensorFlow Lite A primer》，附48页PPT

TensorFlow Lite指南实战《TensorFlow Lite A primer》，附48页PPT

专知会员服务

70+阅读 · 2020年1月17日

【电子书推荐】《深度学习之TensorFlow工程化项目实战》电子书以及配套代码及数据集资源，附787页pdf

【电子书推荐】《深度学习之TensorFlow工程化项目实战》电子书以及配套代码及数据集资源，附787页pdf

专知会员服务

211+阅读 · 2019年12月15日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机与战争：被忽视的环境影响及无人机保护潜力》

俄罗斯规划未来无人机驱动军队

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

《人工智能、武器与影响力：前沿模型在模拟核危机中展现复杂推理》2026最新46页报告

相关资讯

TensorFlow 2.0中文开源书项目：日赞700，登上GitHub热榜

TensorFlow 2.0中文开源书项目：日赞700，登上GitHub热榜

机器之心

20+阅读 · 2019年11月17日

【Github热榜】龙龙老师《TensorFlow深度学习》新书，400页pdf

【Github热榜】龙龙老师《TensorFlow深度学习》新书，400页pdf

专知

36+阅读 · 2019年11月17日

这个项目火了！各种深度学习架构，模型和技巧的集合

这个项目火了！各种深度学习架构，模型和技巧的集合

大数据技术

14+阅读 · 2019年6月13日

GitHub趋势榜第一：TensorFlow+PyTorch深度学习资源大汇总

GitHub趋势榜第一：TensorFlow+PyTorch深度学习资源大汇总

新智元

19+阅读 · 2019年6月8日

Keras作者推荐的Github项目，基于TensorFlow2的生成式模型合集

Keras作者推荐的Github项目，基于TensorFlow2的生成式模型合集

专知

15+阅读 · 2019年5月17日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

54+阅读 · 2019年4月12日

NLP不同任务Tensorflow深度学习模型大全

NLP不同任务Tensorflow深度学习模型大全

专知

10+阅读 · 2019年3月19日

Tensorflow官方视频课程-深度学习工具 TensorFlow入门

Tensorflow官方视频课程-深度学习工具 TensorFlow入门

深度学习与NLP

12+阅读 · 2019年3月12日

【下载】最新TensorFlow深度学习教程指引《Learning TensorFlow，构建深度学习系统指引》

【下载】最新TensorFlow深度学习教程指引《Learning TensorFlow，构建深度学习系统指引》

专知

28+阅读 · 2017年12月6日

tensorflow系列笔记：流程，概念和代码解析

tensorflow系列笔记：流程，概念和代码解析

北京思腾合力科技有限公司

30+阅读 · 2017年11月11日

相关论文

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Arxiv

0+阅读 · 2月4日

FragmentFlow: Scalable Transition State Generation for Large Molecules

Arxiv

0+阅读 · 2月2日

OneFlowSBI: One Model, Many Queries for Simulation-Based Inference

Arxiv

0+阅读 · 1月30日

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

Arxiv

0+阅读 · 1月30日

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Arxiv

0+阅读 · 1月28日

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

Arxiv

0+阅读 · 1月28日

Enginuity: Building an Open Multi-Domain Dataset of Complex Engineering Diagrams

Arxiv

0+阅读 · 1月19日

ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

Arxiv

0+阅读 · 1月18日

DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction

Arxiv

0+阅读 · 1月15日

CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation

Arxiv

0+阅读 · 1月7日

相关基金

循环神经网络多模态深度模型联想记忆功能研究

国家自然科学基金

6+阅读 · 2017年12月31日

基于深度学习的复杂退化模糊图像恢复

国家自然科学基金

5+阅读 · 2015年12月31日

面向数万处理器的有限元线性方程组与模态多级算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

多尺度模块网络下的储备池神经计算模型及算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于极限学习单元的多生物特征图像深度学习建模与识别研究

国家自然科学基金

2+阅读 · 2015年12月31日

神经形态多核处理器的架构模型研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向大规模多步学习问题的学习分类元系统技术研究

国家自然科学基金

5+阅读 · 2015年12月31日

基于框架提升变换的多源图像融合研究

国家自然科学基金

1+阅读 · 2015年12月31日

半监督进化文本聚类算法在动态多源文本分析上的研究

国家自然科学基金

2+阅读 · 2014年12月31日

面向多源遥感图像的深度学习技术与系统研究

国家自然科学基金

17+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员