Deep clustering critically depends on representations that expose clear cluster structure, yet most prior methods learn a single embedding with an autoencoder or a self-supervised encoder and treat it as the primary representation for clustering. In contrast, a pretrained diffusion model induces a rich representation trajectory over network layers and noise timesteps, along which clusterability varies substantially. We propose Diffusion Embedded Clustering (DiEC), an unsupervised clustering framework that exploits this trajectory by directly leveraging intermediate activations of a pretrained diffusion U-Net. DiEC formulates representation selection over layer * timestep and adopts a practical two-stage procedure: it uses the U-Net bottleneck as the Clustering Middle Layer (CML, l*) and identifies the Clustering-Optimal Timestep (COT, t*) via an efficient subset-based, noise-averaged search. Conditioning on (l*, t*), DiEC learns clustering embeddings through a lightweight residual mapping, optimized with a DEC-style KL self-training objective and structural regularization, while a parallel random-timestep denoising-consistency loss stabilizes training and preserves diffusion behavior. Experiments on standard benchmarks demonstrate that DiEC achieves strong clustering performance and reveal the importance of selecting diffusion representations for clustering.
翻译:深度聚类方法的关键在于获取能够清晰展现聚类结构的表示,然而大多数现有方法仅通过自编码器或自监督编码器学习单一嵌入,并将其作为聚类的主要表示。相比之下,预训练的扩散模型在网络层和噪声时间步上诱导出丰富的表示轨迹,其聚类可分离性沿该轨迹存在显著变化。本文提出扩散嵌入聚类(DiEC),这是一种无监督聚类框架,通过直接利用预训练扩散U-Net的中间激活来挖掘该轨迹的潜力。DiEC在“层×时间步”维度上构建表示选择机制,并采用一种实用的两阶段流程:首先将U-Net瓶颈层作为聚类中间层(CML, l*),随后通过一种高效的基于子集且融合噪声平均的搜索方法确定聚类最优时间步(COT, t*)。在(l*, t*)条件确定后,DiEC通过轻量级残差映射学习聚类嵌入,该过程采用DEC风格的KL自训练目标与结构正则化进行优化,同时引入并行的随机时间步去噪一致性损失以稳定训练并保持扩散模型的行为特性。在标准基准数据集上的实验表明,DiEC实现了优异的聚类性能,并揭示了针对聚类任务选择扩散模型内部表示的重要性。