Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

As we all know, multi-view data is more expressive than single-view data and multi-label annotation enjoys richer supervision information than single-label, which makes multi-view multi-label learning widely applicable for various pattern recognition tasks. In this complex representation learning problem, three main challenges can be characterized as follows: i) How to learn consistent representations of samples across all views? ii) How to exploit and utilize category correlations of multi-label to guide inference? iii) How to avoid the negative impact resulting from the incompleteness of views or labels? To cope with these problems, we propose a general multi-view multi-label learning framework named label-guided masked view- and category-aware transformers in this paper. First, we design two transformer-style based modules for cross-view features aggregation and multi-label classification, respectively. The former aggregates information from different views in the process of extracting view-specific features, and the latter learns subcategory embedding to improve classification performance. Second, considering the imbalance of expressive power among views, an adaptively weighted view fusion module is proposed to obtain view-consistent embedding features. Third, we impose a label manifold constraint in sample-level representation learning to maximize the utilization of supervised information. Last but not least, all the modules are designed under the premise of incomplete views and labels, which makes our method adaptable to arbitrary multi-view and multi-label data. Extensive experiments on five datasets confirm that our method has clear advantages over other state-of-the-art methods.

翻译：众所周知，多视图数据比单视图数据更具表征能力，多标签标注比单标签包含更丰富的监督信息，这使得多视图多标签学习可广泛应用于各类模式识别任务。这一复杂表示学习问题面临三个主要挑战：i) 如何在所有视图间学习样本的一致性表示？ii) 如何挖掘并利用多标签的类别关联性来指导推理？iii) 如何避免视图或标签不完整性带来的负面影响？为解决这些问题，本文提出一种通用的多视图多标签学习框架——标签引导的掩码视图和类别感知Transformer。首先，我们分别设计了两个基于Transformer架构的模块，分别用于跨视图特征聚合和多标签分类。前者在提取视图特定特征的过程中聚合来自不同视图的信息，后者通过学习子类别嵌入来提升分类性能。其次，考虑到各视图表征能力的不平衡性，提出自适应加权视图融合模块以获取视图一致性嵌入特征。第三，在样本级表示学习中施加标签流形约束，以最大化利用监督信息。最后但同样重要的是，所有模块均基于不完全视图和标签的前提设计，这使得我们的方法可适配任意多视图和多标签数据。在五个数据集上的大量实验证实，本方法相较于其他最新方法具有显著优势。