MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Marco Bellagente,Manuel Brack,Hannah Teufel,Felix Friedrich,Björn Deiseroth,Constantin Eichenberg,Andrew Dai,Robert Baldock,Souradeep Nanda,Koen Oostermeijer,Andres Felipe Cruz-Salinas,Patrick Schramowski,Kristian Kersting,Samuel Weinbach

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MutliFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

翻译：近年来，文本到图像扩散模型（DM）的普及很大程度上归功于其为用户提供的直观交互界面。用户可通过自然语言表达生成意图，模型能忠实诠释文本提示。然而，仅通过文本表达复杂或细微的概念仍存在困难。为简化图像生成流程，我们提出MultiFusion框架，允许用户通过任意交错组合的多模态、多语言输入来表述复杂微妙的概念。该框架利用预训练模型并通过对齐机制将其整合为统一系统，从而避免从零开始进行大规模训练。实验结果表明，该框架能高效地将各独立模块的能力迁移至下游模型。具体而言，尽管图像生成模块仅在单语言单模态数据上训练，但通过融合所有独立组件，该模块仍能处理多语言、交错式多模态输入。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日