Variational autoencoder (VAE) and generative adversarial networks (GAN) have found widespread applications in clustering and have achieved significant success. However, the potential of these approaches may be limited due to VAE's mediocre generation capability or GAN's well-known instability during adversarial training. In contrast, denoising diffusion probabilistic models (DDPMs) represent a new and promising class of generative models that may unlock fresh dimensions in clustering. In this study, we introduce an innovative expectation-maximization (EM) framework for clustering using DDPMs. In the E-step, we aim to derive a mixture of Gaussian priors for the subsequent M-step. In the M-step, our focus lies in learning clustering-friendly latent representations for the data by employing the conditional DDPM and matching the distribution of latent representations to the mixture of Gaussian priors. We present a rigorous theoretical analysis of the optimization process in the M-step, proving that the optimizations are equivalent to maximizing the lower bound of the Q function within the vanilla EM framework under certain constraints. Comprehensive experiments validate the advantages of the proposed framework, showcasing superior performance in clustering, unsupervised conditional generation and latent representation learning.
翻译:变分自编码器(VAE)和生成对抗网络(GAN)在聚类领域已得到广泛应用并取得了显著成功。然而,受限于VAE中等的生成能力或GAN在对抗训练中众所周知的稳定性问题,这些方法的潜力可能受到制约。相比之下,去噪扩散概率模型(DDPM)作为一种新的、极具前景的生成模型类别,有望为聚类任务开辟全新维度。本文提出了一种基于DDPM的创新性期望最大化(EM)聚类框架。在E步中,我们旨在推导出用于后续M步的高斯混合先验。在M步中,我们通过使用条件DDPM并将潜在表示分布与高斯混合先验相匹配,专注于学习数据中具有聚类友好特性的潜在表示。我们对M步的优化过程进行了严格的理论分析,证明在特定约束条件下,该优化等价于在标准EM框架中最大化Q函数的下界。大量实验验证了所提框架的优势,在聚类、无监督条件生成以及潜在表示学习任务中展现出卓越性能。