Scalable Incomplete Multi-View Clustering with Structure Alignment

The success of existing multi-view clustering (MVC) relies on the assumption that all views are complete. However, samples are usually partially available due to data corruption or sensor malfunction, which raises the research of incomplete multi-view clustering (IMVC). Although several anchor-based IMVC methods have been proposed to process the large-scale incomplete data, they still suffer from the following drawbacks: i) Most existing approaches neglect the inter-view discrepancy and enforce cross-view representation to be consistent, which would corrupt the representation capability of the model; ii) Due to the samples disparity between different views, the learned anchor might be misaligned, which we referred as the Anchor-Unaligned Problem for Incomplete data (AUP-ID). Such the AUP-ID would cause inaccurate graph fusion and degrades clustering performance. To tackle these issues, we propose a novel incomplete anchor graph learning framework termed Scalable Incomplete Multi-View Clustering with Structure Alignment (SIMVC-SA). Specially, we construct the view-specific anchor graph to capture the complementary information from different views. In order to solve the AUP-ID, we propose a novel structure alignment module to refine the cross-view anchor correspondence. Meanwhile, the anchor graph construction and alignment are jointly optimized in our unified framework to enhance clustering quality. Through anchor graph construction instead of full graphs, the time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples. Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Our code is publicly available at https://github.com/wy1019/SIMVC-SA.

翻译：现有成功的不完整多视图聚类（Incomplete Multi-View Clustering, IMVC）方法通常假设所有视图都是完整的。然而，由于数据损坏或传感器故障，样本往往部分缺失，这催生了不完整多视图聚类的研究。尽管已有几种基于锚点的IMVC方法被提出用于处理大规模不完整数据，但它们仍存在以下不足：（i）大多数现有方法忽视了视图间差异，强制跨视图表示保持一致，这会损害模型的表示能力；（ii）由于不同视图间样本的差异性，学习到的锚点可能发生错位，我们称之为不完整数据的锚点未对齐问题（Anchor-Unaligned Problem for Incomplete Data, AUP-ID）。这种AUP-ID会导致图融合不精确，进而降低聚类性能。为应对这些问题，我们提出了一种新颖的不完整锚点图学习框架，名为面向结构对齐的可扩展不完整多视图聚类（Scalable Incomplete Multi-View Clustering with Structure Alignment, SIMVC-SA）。具体地，我们构建了视图特定的锚点图以捕获不同视图间的互补信息。为解决AUP-ID，我们提出了一种新颖的结构对齐模块来细化跨视图锚点的对应关系。同时，锚点图的构建与对齐在我们的统一框架中被联合优化，以提升聚类质量。通过采用锚点图而非全图，所提出的SIMVC-SA的时间与空间复杂度被证明与样本数量呈线性关系。在七个不完整基准数据集上的广泛实验证明了我们方法的有效性与高效性。我们的代码已在https://github.com/wy1019/SIMVC-SA上公开。