xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Le Xue,Manli Shu,Anas Awadalla,Jun Wang,An Yan,Senthil Purushwalkam,Honglu Zhou,Viraj Prabhu,Yutong Dai,Michael S Ryoo,Shrikant Kendre,Jieyu Zhang,Can Qin,Shu Zhang,Chia-Chih Chen,Ning Yu,Juntao Tan,Tulika Manoj Awalgaonkar,Shelby Heinecke,Huan Wang,Yejin Choi,Ludwig Schmidt,Zeyuan Chen,Silvio Savarese,Juan Carlos Niebles,Caiming Xiong,Ran Xu

This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, short for xGen-MultiModal, expands the Salesforce xGen initiative on foundation AI models. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks. Our pre-trained base model exhibits strong in-context learning capabilities and the instruction-tuned model demonstrates competitive performance among open-source LMMs with similar model sizes. In addition, we introduce a safety-tuned model with DPO, aiming to mitigate harmful behaviors such as hallucinations and improve safety. We open-source our models, curated large-scale datasets, and our fine-tuning codebase to facilitate further advancements in LMM research. Associated resources will be available on our project page above.

翻译：本报告介绍了 xGen-MM（亦称 BLIP-3），一个用于开发大型多模态模型（LMMs）的框架。该框架包含精心策划的数据集、训练方案、模型架构以及由此产生的一系列 LMMs。xGen-MM 是 xGen-MultiModal 的简称，它扩展了 Salesforce 在基础 AI 模型领域的 xGen 计划。我们的模型在一系列任务上经过了严格评估，包括单图像和多图像基准测试。我们的预训练基础模型展现出强大的上下文学习能力，而经过指令微调的模型在具有相似模型规模的开源 LMMs 中表现出具有竞争力的性能。此外，我们引入了一个采用 DPO 进行安全调优的模型，旨在减轻幻觉等有害行为并提升安全性。我们开源了我们的模型、策划的大规模数据集以及微调代码库，以促进 LMM 研究的进一步发展。相关资源将在上述项目页面提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日