Fusing information from different modalities can enhance data analysis tasks, including clustering. However, existing multi-view clustering (MVC) solutions are limited to specific domains or rely on a suboptimal and computationally demanding two-stage procedure of representation and clustering. We propose an end-to-end deep learning-based MVC framework for general data (image, tabular, etc.). Our approach involves learning meaningful fused data representations with a novel permutation-based canonical correlation objective. Concurrently, we learn cluster assignments by identifying consistent pseudo-labels across multiple views. We demonstrate the effectiveness of our model using ten MVC benchmark datasets. Theoretically, we show that our model approximates the supervised linear discrimination analysis (LDA) representation. Additionally, we provide an error bound induced by false-pseudo label annotations.
翻译:融合不同模态的信息可以增强包括聚类在内的数据分析任务。然而,现有的多视图聚类(MVC)解决方案局限于特定领域,或依赖于表示与聚类的两阶段过程,该方法效果次优且计算量大。我们提出了一种面向通用数据(图像、表格等)的端到端深度学习MVC框架。我们的方法通过一种新颖的基于排列的典型相关目标,学习有意义的融合数据表示。同时,我们通过识别多个视图间一致的伪标签来学习聚类分配。我们使用十个MVC基准数据集展示了模型的有效性。理论上,我们证明了该模型逼近了有监督线性判别分析(LDA)的表示。此外,我们还提供了由错误伪标签注释引起的误差界。