Continuous Conditional Diffusion Model (CCDM) is a diffusion-based framework designed to generate high-quality images conditioned on continuous regression labels. Although CCDM has demonstrated clear advantages over prior approaches across a range of datasets, it still exhibits notable limitations and has recently been surpassed by a GAN-based method, namely CcGAN-AVAR. These limitations mainly arise from its reliance on an outdated diffusion framework and its low sampling efficiency due to long sampling trajectories. To address these issues, we propose an improved CCDM framework, termed iCCDM, which incorporates the more advanced \textit{Elucidated Diffusion Model} (EDM) framework with substantial modifications to improve both generation quality and sampling efficiency. Specifically, iCCDM introduces a novel matrix-form EDM formulation together with an adaptive vicinal training strategy. Extensive experiments on four benchmark datasets, spanning image resolutions from $64\times64$ to $256\times256$, demonstrate that iCCDM consistently outperforms existing methods, including state-of-the-art large-scale text-to-image diffusion models (e.g., Stable Diffusion 3, FLUX.1, and Qwen-Image), achieving higher generation quality while significantly reducing sampling cost.
翻译:连续条件扩散模型(CCDM)是一种基于扩散的框架,旨在根据连续回归标签生成高质量图像。尽管CCDM在一系列数据集上已展现出相较于先前方法的明显优势,但其仍存在显著局限性,且近期已被一种基于GAN的方法(即CcGAN-AVAR)超越。这些局限性主要源于其对过时扩散框架的依赖,以及因采样轨迹较长导致的低采样效率。为解决这些问题,我们提出一种改进的CCDM框架,称为iCCDM,该框架引入了更先进的\textit{明晰化扩散模型}(EDM)框架并进行大幅修改,以同时提升生成质量与采样效率。具体而言,iCCDM提出了一种新颖的矩阵形式EDM公式化方法,并结合自适应邻域训练策略。在四个基准数据集(涵盖$64\times64$至$256\times256$的图像分辨率)上的大量实验表明,iCCDM始终优于现有方法,包括最先进的大规模文生图扩散模型(例如Stable Diffusion 3、FLUX.1和Qwen-Image),在显著降低采样成本的同时实现了更高的生成质量。