Recent advances in image generation, particularly via diffusion models, have led to impressive improvements in image synthesis quality. Despite this, diffusion models are still challenged by model-induced artifacts and limited stability in image fidelity. In this work, we hypothesize that the primary cause of this issue is the improper resampling operation that introduces aliasing in the diffusion model and a careful alias-free resampling dictated by image processing theory can improve the model's performance in image synthesis. We propose the integration of alias-free resampling layers into the UNet architecture of diffusion models without adding extra trainable parameters, thereby maintaining computational efficiency. We then assess whether these theory-driven modifications enhance image quality and rotational equivariance. Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and MNIST-M, reveal consistent gains in image quality, particularly in terms of FID and KID scores. Furthermore, we propose a modified diffusion process that enables user-controlled rotation of generated images without requiring additional training. Our findings highlight the potential of theory-driven enhancements such as alias-free resampling in generative models to improve image quality while maintaining model efficiency and pioneer future research directions to incorporate them into video-generating diffusion models, enabling deeper exploration of the applications of alias-free resampling in generative modeling.
翻译:近期图像生成领域,特别是通过扩散模型,在图像合成质量方面取得了显著进步。尽管如此,扩散模型仍面临模型引入的伪影和图像保真度稳定性有限等挑战。本研究中,我们假设该问题的主要原因是重采样操作不当,导致扩散模型中引入混叠现象,而依据图像处理理论设计的无混叠重采样能够提升模型在图像合成中的性能。我们提出将无混叠重采样层集成到扩散模型的UNet架构中,且不增加额外的可训练参数,从而保持计算效率。随后,我们评估这些理论驱动的改进是否提升了图像质量和旋转等变性。在CIFAR-10、MNIST和MNIST-M等基准数据集上的实验结果表明,图像质量持续提升,尤其在FID和KID分数方面。此外,我们提出一种改进的扩散过程,使用户能够无需额外训练即可控制生成图像的旋转。我们的研究结果凸显了理论驱动的增强技术(如无混叠重采样)在生成模型中提升图像质量并保持模型效率的潜力,并为未来研究方向开辟了道路,例如将其整合到视频生成扩散模型中,从而更深入地探索无混叠重采样在生成建模中的应用。