Diffusion models have revolutionized image generation in recent years, yet they are still limited to a few sizes and aspect ratios. We propose ElasticDiffusion, a novel training-free decoding method that enables pretrained text-to-image diffusion models to generate images with various sizes. ElasticDiffusion attempts to decouple the generation trajectory of a pretrained model into local and global signals. The local signal controls low-level pixel information and can be estimated on local patches, while the global signal is used to maintain overall structural consistency and is estimated with a reference image. We test our method on CelebA-HQ (faces) and LAION-COCO (objects/indoor/outdoor scenes). Our experiments and qualitative results show superior image coherence quality across aspect ratios compared to MultiDiffusion and the standard decoding strategy of Stable Diffusion. Code: https://github.com/MoayedHajiAli/ElasticDiffusion-official.git
翻译:近年来,扩散模型彻底革新了图像生成领域,但其仍局限于少数几种尺寸和宽高比。我们提出ElasticDiffusion——一种新颖的无需训练的解码方法,使预训练的文本到图像扩散模型能够生成各种尺寸的图像。ElasticDiffusion尝试将预训练模型的生成轨迹解耦为局部信号和全局信号:局部信号控制底层像素信息,可在局部图块上估计;全局信号用于保持整体结构一致性,通过参考图像进行估计。我们在CelebA-HQ(人脸)和LAION-COCO(物体/室内/室外场景)上测试了该方法。实验与定性结果表明,与MultiDiffusion及Stable Diffusion的标准解码策略相比,该方法在不同宽高比下的图像连贯性质量更优。代码:https://github.com/MoayedHajiAli/ElasticDiffusion-official.git