Deep clustering methods typically rely on a single, well-defined representation for clustering. In contrast, pretrained diffusion models provide abundant and diverse multi-scale representations across network layers and noise timesteps. However, a key challenge is how to efficiently identify the most clustering-friendly representation in the layer*timestep space. To address this issue, we propose Diffusion Embedded Clustering (DiEC), an unsupervised framework that performs clustering by leveraging optimal intermediate representations from pretrained diffusion models. DiEC systematically evaluates the clusterability of representations along the trajectory of network depth and noise timesteps. Meanwhile, an unsupervised search strategy is designed for recognizing the Clustering-optimal Layer (COL) and Clustering-optimal Timestep (COT) in the layer*timestep space of pretrained diffusion models, aiming to promote clustering performance and reduce computational overhead. DiEC is fine-tuned primarily with a structure-preserving DEC-style KL-divergence objective at the fixed COL + COT, together with a random-timestep diffusion denoising objective to maintain the generative capability of the pretrained model. Without relying on augmentation-based consistency constraints or contrastive learning, DiEC achieves excellent clustering performance across multiple benchmark datasets.
翻译:深度聚类方法通常依赖于单一且定义明确的表示进行聚类。相比之下,预训练的扩散模型在网络的各个层级和噪声时间步上提供了丰富多样的多尺度表示。然而,一个关键挑战是如何在层级*时间步空间中高效地识别出最有利于聚类的表示。为解决这一问题,我们提出了扩散嵌入聚类(DiEC),这是一个无监督框架,通过利用预训练扩散模型中的最优中间表示来执行聚类。DiEC 系统地评估了表示沿着网络深度和噪声时间步轨迹的可聚类性。同时,我们设计了一种无监督搜索策略,用于在预训练扩散模型的层级*时间步空间中识别聚类最优层级(COL)和聚类最优时间步(COT),旨在提升聚类性能并减少计算开销。DiEC 主要在固定的 COL + COT 处通过一种保持结构的 DEC 风格 KL 散度目标进行微调,并结合一个随机时间步的扩散去噪目标以保持预训练模型的生成能力。在不依赖基于增强的一致性约束或对比学习的情况下,DiEC 在多个基准数据集上实现了优异的聚类性能。