The past two decades have seen increasingly rapid advances in the field of multi-view representation learning due to it extracting useful information from diverse domains to facilitate the development of multi-view applications. However, the community faces two challenges: i) how to learn robust representations from a large amount of unlabeled data to against noise or incomplete views setting, and ii) how to balance view consistency and complementary for various downstream tasks. To this end, we utilize a deep fusion network to fuse view-specific representations into the view-common representation, extracting high-level semantics for obtaining robust representation. In addition, we employ a clustering task to guide the fusion network to prevent it from leading to trivial solutions. For balancing consistency and complementary, then, we design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation. These modules are incorporated into a unified method known as CLustering-guided cOntrastiVE fusioN (CLOVEN). We quantitatively and qualitatively evaluate the proposed method on five datasets, demonstrating that CLOVEN outperforms 11 competitive multi-view learning methods in clustering and classification. In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors. Furthermore, the visualization analysis shows that CLOVEN can preserve the intrinsic structure of view-specific representation while also improving the compactness of view-commom representation. Our source code will be available soon at https://github.com/guanzhou-ke/cloven.
翻译:过去二十年来,由于多视角表示学习能从不同领域提取有用信息以促进多视角应用的发展,该领域取得了日益迅速的发展。然而,该领域面临两大挑战:i)如何从大量无标签数据中学习鲁棒表示以对抗噪声或不完整视图设置,ii)如何为各类下游任务平衡视角一致性与互补性。为此,我们利用深度融合网络将视角特定表示融合为视角通用表示,通过提取高层语义获得鲁棒表示。此外,我们采用聚类任务引导融合网络,避免其陷入平凡解。为平衡一致性与互补性,我们设计了一种非对称对比策略,对齐视角通用表示与每个视角特定表示。这些模块被整合为统一方法——聚类引导对比融合(CLOVEN)。我们在五个数据集上进行了定量与定性评估,证明CLOVEN在聚类和分类任务中均优于11种竞争性多视角学习方法。在不完整视角场景下,我们提出的方法比竞争对手更能抵抗噪声干扰。此外,可视化分析表明,CLOVEN能在保持视角特定表示内在结构的同时,提升视角通用表示的紧致性。我们的源代码即将在https://github.com/guanzhou-ke/cloven 公布。