Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources. This paper presents an in-depth analysis of existing approaches in this domain, highlighting a commonly overlooked aspect: the redundancy between view-consistent and view-specific representations. To this end, we propose an innovative framework for multi-view representation learning, which incorporates a technique we term 'distilled disentangling'. Our method introduces the concept of masked cross-view prediction, enabling the extraction of compact, high-quality view-consistent representations from various sources without incurring extra computational overhead. Additionally, we develop a distilled disentangling module that efficiently filters out consistency-related information from multi-view representations, resulting in purer view-specific representations. This approach significantly reduces redundancy between view-consistent and view-specific representations, enhancing the overall efficiency of the learning process. Our empirical evaluations reveal that higher mask ratios substantially improve the quality of view-consistent representations. Moreover, we find that reducing the dimensionality of view-consistent representations relative to that of view-specific representations further refines the quality of the combined representations. Our code is accessible at: https://github.com/Guanzhou-Ke/MRDD.
翻译:多视角表示学习旨在从不同数据源中提取兼具视角一致性与视角特异性的鲁棒表示。本文对该领域的现有方法进行深入分析,揭示了一个常被忽视的问题:视角一致性与视角特异性表示之间存在冗余。为此,我们提出了一种创新的多视角表示学习框架,其中融入了一项称为“蒸馏解耦”的技术。该方法引入了掩码跨视角预测的概念,能够在无需额外计算开销的情况下,从不同数据源中提取紧凑且高质量的视角一致性表示。此外,我们设计了一个蒸馏解耦模块,可高效过滤多视角表示中的一致性相关信息,从而获得更纯净的视角特异性表示。该方法显著减少了视角一致性与视角特异性表示之间的冗余,提升了整体学习效率。实验评估表明,更高的掩码比例能显著提升视角一致性表示的质量。同时,我们发现降低视角一致性表示相对于视角特异性表示的维度,可进一步优化组合表示的质量。我们的代码已开源:https://github.com/Guanzhou-Ke/MRDD。