Diffusion-based image generation models can enhance image quality when conditioned on ground truth labels. Here, we conduct a comprehensive experimental study on image-level conditioning for diffusion models using cluster assignments. We investigate how individual clustering determinants, such as the number of clusters and the clustering method, impact image synthesis across three different datasets. Given the optimal number of clusters with respect to image synthesis, we show that cluster-conditioning can achieve state-of-the-art performance, with an FID of 1.67 for CIFAR10 and 2.17 for CIFAR100, along with a strong increase in training sample efficiency. We further propose a novel empirical method to estimate an upper bound for the optimal number of clusters. Unlike existing approaches, we find no significant association between clustering performance and the corresponding cluster-conditional FID scores. The code is available at https://github.com/HHU-MMBS/cedm-official-wavc2025.
翻译:基于扩散的图像生成模型在基于真实标签进行条件化时能够提升图像质量。本文针对扩散模型的图像级条件化,利用聚类分配进行了全面的实验研究。我们探究了不同的聚类决定因素(如聚类数量和聚类方法)在三个不同数据集上对图像合成的影响。针对图像合成任务确定了最优聚类数量后,我们证明基于聚类的条件化能够实现最先进的性能,在CIFAR10数据集上FID达到1.67,在CIFAR100数据集上达到2.17,同时训练样本效率显著提升。我们进一步提出了一种新颖的经验方法来估计最优聚类数量的上界。与现有方法不同,我们发现聚类性能与相应的聚类条件化FID分数之间没有显著关联。代码发布于 https://github.com/HHU-MMBS/cedm-official-wavc2025。