Incomplete multi-view clustering (IMVC) is an unsupervised approach, among which IMVC via contrastive learning has received attention due to its excellent performance. The previous methods have the following problems: 1) Over-reliance on additional projection heads when solving the dimensional collapse problem in which latent features are only valid in lower-dimensional subspaces during clustering. However, many parameters in the projection heads are unnecessary. 2) The recovered view contain inconsistent private information and useless private information will mislead the learning of common semantics due to consistent learning and reconstruction learning on the same feature. To address the above issues, we propose a novel incomplete multi-view contrastive clustering framework. This framework directly optimizes the latent feature subspace, utilizes the learned feature vectors and their sub-vectors for reconstruction learning and consistency learning, thereby effectively avoiding dimensional collapse without relying on projection heads. Since reconstruction loss and contrastive loss are performed on different features, the adverse effect of useless private information is reduced. For the incomplete data, the missing information is recovered by the cross-view prediction mechanism and the inconsistent information from different views is discarded by the minimum conditional entropy to further avoid the influence of private information. Extensive experimental results of the method on 5 public datasets show that the method achieves state-of-the-art clustering results.
翻译:不完全多视角聚类(IMVC)是一种无监督方法,其中基于对比学习的IMVC因其卓越性能而受到关注。现有方法存在以下问题:1)在解决聚类过程中潜在特征仅在低维子空间有效的维度坍塌问题时,过度依赖额外的投影头,但投影头中的大量参数并非必要;2)恢复的视角包含不一致的私有信息,且无用私有信息会因在同一特征上同时进行一致性学习和重构学习而误导公共语义的学习。针对上述问题,我们提出一种新型不完全多视角对比聚类框架。该框架直接优化潜在特征子空间,利用学习的特征向量及其子向量进行重构学习和一致性学习,从而在不依赖投影头的情况下有效避免维度坍塌。由于重构损失和对比损失在不同特征上执行,减少了无用私有信息的负面影响。对于不完整数据,缺失信息通过跨视角预测机制恢复,不同视角的不一致信息通过最小条件熵被丢弃,以进一步避免私有信息的影响。在5个公开数据集上的广泛实验结果表明,该方法达到了最先进的聚类效果。