Existing multi-stage clustering methods independently learn the salient features from multiple views and then perform the clustering task. Particularly, multi-view clustering (MVC) has attracted a lot of attention in multi-view or multi-modal scenarios. MVC aims at exploring common semantics and pseudo-labels from multiple views and clustering in a self-supervised manner. However, limited by noisy data and inadequate feature learning, such a clustering paradigm generates overconfident pseudo-labels that mis-guide the model to produce inaccurate predictions. Therefore, it is desirable to have a method that can correct this pseudo-label mistraction in multi-stage clustering to avoid the bias accumulation. To alleviate the effect of overconfident pseudo-labels and improve the generalization ability of the model, this paper proposes a novel multi-stage deep MVC framework where multi-view self-distillation (DistilMVC) is introduced to distill dark knowledge of label distribution. Specifically, in the feature subspace at different hierarchies, we explore the common semantics of multiple views through contrastive learning and obtain pseudo-labels by maximizing the mutual information between views. Additionally, a teacher network is responsible for distilling pseudo-labels into dark knowledge, supervising the student network and improving its predictive capabilities to enhance the robustness. Extensive experiments on real-world multi-view datasets show that our method has better clustering performance than state-of-the-art methods.
翻译:现有的大多数多阶段聚类方法独立地从多个视图中学习显著特征,然后执行聚类任务。特别是,多视图聚类(MVC)在多视图或多模态场景中引起了广泛关注。MVC旨在从多个视图中探索共同语义和伪标签,并以自监督方式进行聚类。然而,受限于噪声数据和不充分特征学习,这种聚类范式会产生过度自信的伪标签,从而误导模型产生不准确的预测。因此,有必要在现有方法中修正多阶段聚类中的伪标签误导,以避免偏差积累。为了减轻过度自信伪标签的影响并提升模型的泛化能力,本文提出了一种新颖的多阶段深度MVC框架,其中引入了多视图自蒸馏(DistilMVC)来蒸馏标签分布中的暗知识。具体地,在不同层次的特征子空间中,我们通过对比学习探索多个视图的共同语义,并通过最大化视图间的互信息获得伪标签。此外,一个教师网络负责将伪标签蒸馏为暗知识,用以监督学生网络并提高其预测能力,从而增强鲁棒性。在真实多视图数据集上的大量实验表明,我们的方法比现有最先进方法具有更好的聚类性能。