Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance

Recent advances in image generation, particularly via diffusion models, have led to impressive improvements in image synthesis quality. Despite this, diffusion models are still challenged by model-induced artifacts and limited stability in image fidelity. In this work, we hypothesize that the primary cause of this issue is the improper resampling operation that introduces aliasing in the diffusion model and a careful alias-free resampling dictated by image processing theory can improve the model's performance in image synthesis. We propose the integration of alias-free resampling layers into the UNet architecture of diffusion models without adding extra trainable parameters, thereby maintaining computational efficiency. We then assess whether these theory-driven modifications enhance image quality and rotational equivariance. Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and MNIST-M, reveal consistent gains in image quality, particularly in terms of FID and KID scores. Furthermore, we propose a modified diffusion process that enables user-controlled rotation of generated images without requiring additional training. Our findings highlight the potential of theory-driven enhancements such as alias-free resampling in generative models to improve image quality while maintaining model efficiency and pioneer future research directions to incorporate them into video-generating diffusion models, enabling deeper exploration of the applications of alias-free resampling in generative modeling.

翻译：近期图像生成领域，特别是通过扩散模型，在图像合成质量方面取得了显著进步。尽管如此，扩散模型仍面临模型引入的伪影和图像保真度稳定性有限等挑战。本研究中，我们假设该问题的主要原因是重采样操作不当，导致扩散模型中引入混叠现象，而依据图像处理理论设计的无混叠重采样能够提升模型在图像合成中的性能。我们提出将无混叠重采样层集成到扩散模型的UNet架构中，且不增加额外的可训练参数，从而保持计算效率。随后，我们评估这些理论驱动的改进是否提升了图像质量和旋转等变性。在CIFAR-10、MNIST和MNIST-M等基准数据集上的实验结果表明，图像质量持续提升，尤其在FID和KID分数方面。此外，我们提出一种改进的扩散过程，使用户能够无需额外训练即可控制生成图像的旋转。我们的研究结果凸显了理论驱动的增强技术（如无混叠重采样）在生成模型中提升图像质量并保持模型效率的潜力，并为未来研究方向开辟了道路，例如将其整合到视频生成扩散模型中，从而更深入地探索无混叠重采样在生成建模中的应用。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日