Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes. This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions. Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses. To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed images of any size, while minimizing the need for high-memory GPU resources. Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes. To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage. This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads. Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks demonstrate that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2x compared to the traditional tiled algorithm.

翻译：稳定扩散是一种用于文本到图像合成的生成模型，在生成不同尺寸图像时经常遇到分辨率引发的构图问题。这个问题主要源于该模型是在单尺度图像及其对应文本描述的对齐数据上训练的。此外，直接对无限尺寸图像进行训练是不可行的，因为这需要海量的文本-图像对并产生巨大的计算开销。为克服这些挑战，我们提出了一种名为任意尺寸扩散（ASD）的两阶段流水线，旨在高效生成任意尺寸构图良好的图像，同时最大程度减少对高内存GPU资源的需求。具体来说，第一阶段称为任意比例自适应扩散（ARAD），利用精心筛选的有限比例范围图像来优化文本条件扩散模型，从而增强其调整构图以适应不同图像尺寸的能力。为支持生成任意期望尺寸的图像，我们在后续阶段进一步引入了一种名为快速无缝块扩散（FSTD）的技术。该方法允许将ASD输出快速放大到任意高分辨率尺寸，同时避免拼接伪影或内存过载。在LAION-COCO和MM-CelebA-HQ基准上的实验结果表明，ASD能够生成任意尺寸的结构良好图像，与传统分块算法相比，推理时间减少了两倍。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日