Deep clustering methods typically rely on a single, well-defined representation for clustering. In contrast, pretrained diffusion models provide abundant and diverse multi-scale representations across network layers and noise timesteps. However, a key challenge is how to efficiently identify the most clustering-friendly representation in the layer*timestep space. To address this issue, we propose Diffusion Embedded Clustering (DiEC), an unsupervised framework that performs clustering by leveraging optimal intermediate representations from pretrained diffusion models. DiEC systematically evaluates the clusterability of representations along the trajectory of network depth and noise timesteps. Meanwhile, an unsupervised search strategy is designed for recognizing the Clustering-optimal Layer (COL) and Clustering-optimal Timestep (COT) in the layer*timestep space of pretrained diffusion models, aiming to promote clustering performance and reduce computational overhead. DiEC is fine-tuned primarily with a structure-preserving DEC-style KL-divergence objective at the fixed COL + COT, together with a random-timestep diffusion denoising objective to maintain the generative capability of the pretrained model. Without relying on augmentation-based consistency constraints or contrastive learning, DiEC achieves excellent clustering performance across multiple benchmark datasets. Code will be released upon acceptance.
翻译:深度聚类方法通常依赖于单一且定义明确的表示进行聚类。相比之下,预训练扩散模型在网络层和噪声时间步上提供了丰富多样的多尺度表示。然而,一个关键挑战是如何在层*时间步空间中高效识别最有利于聚类的表示。为解决此问题,我们提出了扩散嵌入聚类(DiEC),这是一个无监督框架,通过利用预训练扩散模型中的最优中间表示来执行聚类。DiEC系统评估了表示在网络深度和噪声时间步轨迹上的可聚类性。同时,我们设计了一种无监督搜索策略,用于在预训练扩散模型的层*时间步空间中识别聚类最优层(COL)和聚类最优时间步(COT),旨在提升聚类性能并降低计算开销。DiEC主要在固定的COL + COT处通过结构保持的DEC风格KL散度目标进行微调,并结合随机时间步扩散去噪目标以保持预训练模型的生成能力。在不依赖基于增强的一致性约束或对比学习的情况下,DiEC在多个基准数据集上实现了优异的聚类性能。代码将在论文被接受后发布。