Diffusion models have advanced generative AI significantly in terms of editing and creating naturalistic images. However, efficiently improving generated image quality is still of paramount interest. In this context, we propose a generic "naturalness" preserving loss function, viz., kurtosis concentration (KC) loss, which can be readily applied to any standard diffusion model pipeline to elevate the image quality. Our motivation stems from the projected kurtosis concentration property of natural images, which states that natural images have nearly constant kurtosis values across different band-pass versions of the image. To retain the "naturalness" of the generated images, we enforce reducing the gap between the highest and lowest kurtosis values across the band-pass versions (e.g., Discrete Wavelet Transform (DWT)) of images. Note that our approach does not require any additional guidance like classifier or classifier-free guidance to improve the image quality. We validate the proposed approach for three diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, and (3) image super-resolution. Integrating the proposed KC loss has improved the perceptual quality across all these tasks in terms of both FID, MUSIQ score, and user evaluation.
翻译:扩散模型在编辑和生成逼真图像方面显著推动了生成式AI的发展。然而,高效提升生成图像质量仍是核心关注问题。为此,我们提出一种通用的"自然性"保持损失函数——峰度集中(KC)损失,可便捷地应用于任何标准扩散模型流程中以提升图像质量。其设计灵感源于自然图像的投影峰度集中特性:自然图像在不同带通版本中具有近乎恒定的峰度值。为保留生成图像的"自然性",我们通过强制缩小图像带通版本(如离散小波变换DWT)中最高与最低峰度值之间的差距来实现优化。值得注意的是,该方法无需分类器引导或无分类器引导等额外辅助即可提升图像质量。我们在三类不同任务中验证了该方法的有效性,包括:(1) 基于文本引导的个性化小样本微调,(2) 无条件图像生成,(3) 图像超分辨率。实验表明,集成所提出的KC损失在FID、MUSIQ评分及用户评估中均显著提升了上述任务的感知质量。