AltDiffusion: A Multilingual Text-to-Image Diffusion Model

Large Text-to-Image(T2I) diffusion models have shown a remarkable capability to produce photorealistic and diverse images based on text inputs. However, existing works only support limited language input, e.g., English, Chinese, and Japanese, leaving users beyond these languages underserved and blocking the global expansion of T2I models. Therefore, this paper presents AltDiffusion, a novel multilingual T2I diffusion model that supports eighteen different languages. Specifically, we first train a multilingual text encoder based on the knowledge distillation. Then we plug it into a pretrained English-only diffusion model and train the model with a two-stage schema to enhance the multilingual capability, including concept alignment and quality improvement stage on a large-scale multilingual dataset. Furthermore, we introduce a new benchmark, which includes Multilingual-General-18(MG-18) and Multilingual-Cultural-18(MC-18) datasets, to evaluate the capabilities of T2I diffusion models for generating high-quality images and capturing culture-specific concepts in different languages. Experimental results on both MG-18 and MC-18 demonstrate that AltDiffusion outperforms current state-of-the-art T2I models, e.g., Stable Diffusion in multilingual understanding, especially with respect to culture-specific concepts, while still having comparable capability for generating high-quality images. All source code and checkpoints could be found in https://github.com/superhero-7/AltDiffuson.

翻译：大型文本到图像（T2I）扩散模型展现出基于文本输入生成逼真且多样化图像的卓越能力。然而，现有工作仅支持有限的语言输入（例如英语、中文和日语），导致使用其他语言的用户服务不足，并阻碍了T2I模型的全球推广。为此，本文提出AltDiffusion，一种支持十八种语言的新型多语言T2I扩散模型。具体而言，我们首先基于知识蒸馏训练一个多语言文本编码器，随后将其嵌入预训练的纯英语扩散模型，并采用两阶段方案（包括大规模多语言数据集上的概念对齐与质量提升阶段）训练模型以增强多语言能力。此外，我们引入了一个新基准，包含多语言通用数据集（MG-18）和多语言文化数据集（MC-18），用于评估T2I扩散模型生成高质量图像以及捕捉不同语言中文化特定概念的能力。在MG-18和MC-18上的实验结果表明，AltDiffusion在多语言理解（尤其是文化特定概念方面）优于当前最先进的T2I模型（如Stable Diffusion），同时仍具备相当的高质量图像生成能力。所有源代码和检查点均可从https://github.com/superhero-7/AltDiffuson 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日