Multi-view representation learning captures comprehensive information from multiple views of a shared context. Recent works intuitively apply contrastive learning (CL) to learn representations, regarded as a pairwise manner, which is still scalable: view-specific noise is not filtered in learning view-shared representations; the fake negative pairs, where the negative terms are actually within the same class as the positive, and the real negative pairs are coequally treated; and evenly measuring the similarities between terms might interfere with optimization. Importantly, few works research the theoretical framework of generalized self-supervised multi-view learning, especially for more than two views. To this end, we rethink the existing multi-view learning paradigm from the information theoretical perspective and then propose a novel information theoretical framework for generalized multi-view learning. Guided by it, we build a multi-view coding method with a three-tier progressive architecture, namely Information theory-guided heuristic Progressive Multi-view Coding (IPMC). In the distribution-tier, IPMC aligns the distribution between views to reduce view-specific noise. In the set-tier, IPMC builds self-adjusted pools for contrasting, which utilizes a view filter to adaptively modify the pools. Lastly, in the instance-tier, we adopt a designed unified loss to learn discriminative representations and reduce the gradient interference. Theoretically and empirically, we demonstrate the superiority of IPMC over state-of-the-art methods.
翻译:多视图表示学习旨在从共享上下文的多个视图中捕获全面信息。现有研究直观地采用对比学习(CL)以成对方式学习表示,但仍存在可扩展性问题:在视图共享表示学习中未过滤视图特定噪声;将实际属于同一类别的负样本与真正负样本同等对待的假负样本问题;以及均匀度量各项间相似性可能干扰优化过程。更重要的是,鲜有研究关注广义自监督多视图学习的理论框架,特别是针对两个以上视图的情况。为此,我们从信息论视角重新审视现有多视图学习范式,提出了一种新颖的广义多视图学习信息论理论框架。在此框架指导下,我们构建了一种具有三级渐进式架构的多视图编码方法——信息论引导的自启发式渐进式多视图编码(IPMC)。在分布层中,IPMC对齐视图间的分布以减少视图特定噪声;在集合层中,IPMC构建自适应对比池,通过视图过滤器动态调整池结构;最后在实例层中,我们采用设计的统一损失函数学习判别性表示并降低梯度干扰。理论与实证结果均证明IPMC优于当前最先进的方法。