DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · INTERACT · Attention · Notability ·

2023 年 11 月 29 日

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

翻译：DeepSpeed-VisualChat: 通过多模态因果注意力实现的多轮多图像交错对话

Zhewei Yao,Xiaoxia Wu,Conglong Li,Minjia Zhang,Heyang Qin,Olatunji Ruwase,Ammar Ahmad Awan,Samyam Rajbhandari,Yuxiong He

Most of the existing multi-modal models, hindered by their incapacity to adeptly manage interleaved image-and-text inputs in multi-image, multi-round dialogues, face substantial constraints in resource allocation for training and data accessibility, impacting their adaptability and scalability across varied interaction realms. To address this, we present the DeepSpeed-VisualChat framework, designed to optimize Large Language Models (LLMs) by incorporating multi-modal capabilities, with a focus on enhancing the proficiency of Large Vision and Language Models in handling interleaved inputs. Our framework is notable for (1) its open-source support for multi-round and multi-image dialogues, (2) introducing an innovative multi-modal causal attention mechanism, and (3) utilizing data blending techniques on existing datasets to assure seamless interactions in multi-round, multi-image conversations. Compared to existing frameworks, DeepSpeed-VisualChat shows superior scalability up to 70B parameter language model size, representing a significant advancement in multi-modal language models and setting a solid foundation for future explorations.

翻译：大多数现有多模态模型受限于无法有效处理多图像、多轮对话中的交错图文输入，在训练资源分配和数据可用性方面面临显著约束，进而影响其在不同交互场景中的适应性和可扩展性。为解决这一问题，我们提出DeepSpeed-VisualChat框架，该框架通过融合多模态能力优化大语言模型（LLMs），重点提升大型视觉语言模型处理交错输入的能力。本框架具有以下显著特征：（1）开源支持多轮多图像对话；（2）引入创新的多模态因果注意力机制；（3）利用现有数据集的数据混合技术保障多轮多图像对话的无缝交互。与现有框架相比，DeepSpeed-VisualChat在高达700亿参数的语言模型规模上展现出优越的可扩展性，标志着多模态语言模型的重大进展，为未来探索奠定了坚实基础。

0

相关内容

语言模型化

语言模型化

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

直接优化半周长线长的VLSI两阶段迭代布局算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Arxiv

0+阅读 · 2024年1月22日

NLCG-Net: A Model-Based Zero-Shot Learning Framework for Undersampled Quantitative MRI Reconstruction

Arxiv

0+阅读 · 2024年1月22日

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Arxiv

0+阅读 · 2024年1月22日

DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing

Arxiv

0+阅读 · 2024年1月19日

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

Arxiv

0+阅读 · 2024年1月19日

Learning Position-Aware Implicit Neural Network for Real-World Face Inpainting

Arxiv

0+阅读 · 2024年1月19日

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Arxiv

0+阅读 · 2024年1月18日

BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text Detection

Arxiv

0+阅读 · 2024年1月18日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

语言模型化

最新内容

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

11+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

6+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

7+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

6+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

7+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

5+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

5+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

5+阅读 · 6月14日

俄乌战场地面机器人如何改写战争规则

俄乌战场地面机器人如何改写战争规则

专知会员服务

9+阅读 · 6月14日

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

专知会员服务

10+阅读 · 6月14日

《新空中力量概念：来自敏捷战斗运用的启示》2026最新50页报告

《新空中力量概念：来自敏捷战斗运用的启示》2026最新50页报告

专知会员服务

12+阅读 · 6月14日

《无人水面艇文献综述与结构设计》135页

《无人水面艇文献综述与结构设计》135页

专知会员服务

14+阅读 · 6月13日

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

专知会员服务

11+阅读 · 6月13日

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

专知会员服务

2+阅读 · 6月13日

智能体时间序列系统全景综述：架构、可靠性与研究前沿

智能体时间序列系统全景综述：架构、可靠性与研究前沿

专知会员服务

11+阅读 · 6月13日

相关VIP内容

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

消耗优势：美军的“精确规模化”概念

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Arxiv

0+阅读 · 2024年1月22日

NLCG-Net: A Model-Based Zero-Shot Learning Framework for Undersampled Quantitative MRI Reconstruction

Arxiv

0+阅读 · 2024年1月22日

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Arxiv

0+阅读 · 2024年1月22日

DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing

Arxiv

0+阅读 · 2024年1月19日

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

Arxiv

0+阅读 · 2024年1月19日

Learning Position-Aware Implicit Neural Network for Real-World Face Inpainting

Arxiv

0+阅读 · 2024年1月19日

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Arxiv

0+阅读 · 2024年1月18日

BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text Detection

Arxiv

0+阅读 · 2024年1月18日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

直接优化半周长线长的VLSI两阶段迭代布局算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员