Modeling Multiple Views via Implicitly Preserving Global Consistency and Local Complementarity

While self-supervised learning techniques are often used to mining implicit knowledge from unlabeled data via modeling multiple views, it is unclear how to perform effective representation learning in a complex and inconsistent context. To this end, we propose a methodology, specifically consistency and complementarity network (CoCoNet), which avails of strict global inter-view consistency and local cross-view complementarity preserving regularization to comprehensively learn representations from multiple views. On the global stage, we reckon that the crucial knowledge is implicitly shared among views, and enhancing the encoder to capture such knowledge from data can improve the discriminability of the learned representations. Hence, preserving the global consistency of multiple views ensures the acquisition of common knowledge. CoCoNet aligns the probabilistic distribution of views by utilizing an efficient discrepancy metric measurement based on the generalized sliced Wasserstein distance. Lastly on the local stage, we propose a heuristic complementarity-factor, which joints cross-view discriminative knowledge, and it guides the encoders to learn not only view-wise discriminability but also cross-view complementary information. Theoretically, we provide the information-theoretical-based analyses of our proposed CoCoNet. Empirically, to investigate the improvement gains of our approach, we conduct adequate experimental validations, which demonstrate that CoCoNet outperforms the state-of-the-art self-supervised methods by a significant margin proves that such implicit consistency and complementarity preserving regularization can enhance the discriminability of latent representations.

翻译：尽管自监督学习技术常通过建模多视图从未标记数据中挖掘隐含知识，但在复杂且不一致的背景下如何实现有效的表征学习仍不明确。为此，我们提出一种名为一致性与互补性网络（CoCoNet）的方法，该方法利用严格的全局视图间一致性与局部跨视图互补性保持正则化，从多视角全面学习表征。在全局层面，我们认为关键知识隐式共享于各视图之间，增强编码器从数据中捕获此类知识的能力可提升所学表征的判别性。因此，保持多视图的全局一致性确保了通用知识的获取。CoCoNet通过基于广义切片瓦瑟斯坦距离的高效差异度量指标来对齐视图的概率分布。最后在局部层面，我们提出一种启发式互补因子，该因子融合跨视图判别知识，引导编码器不仅学习视图特定判别性，还学习跨视图互补信息。理论上，我们对所提出的CoCoNet进行了基于信息论的分析。实证上，为探究本方法的改进增益，我们开展了充分的实验验证，结果表明CoCoNet显著优于最先进的自监督方法，证明这种隐式一致性与互补性保持正则化能够增强潜在表征的判别性。