Multimodal Graph Negative Learning

Multimodal attributed graphs (MAGs) integrate graph topology with heterogeneous modality attributes, such as text and images, thereby enabling richer modeling of complex relational systems. However, such expressiveness also makes learning on MAGs depend on multiple semantic sources, including structural topology, textual and visual attributes, each of which can be regarded as a branch for node representation. Node-level branch semantic imbalance arises when these branches differ across nodes in semantic informativeness and reliability: a branch that provides discriminative semantics for one node may mislead another due to bias in modality quality or structural context. Existing methods often mitigate such heterogeneity through cross-branch agreement or alignment, implicitly treating the dominant prediction as reliable supervision. When the dominant branch is biased, forced imitation may propagate its bias to other branches and suppress original semantics that are useful for classification. We propose GraphMNL, a graph-aware multimodal negative learning framework that addresses this issue by using Negative Learning as cross-branch guidance. Instead of forcing inferior branches to imitate a teacher prediction, the model teaches them which classes a node is unlikely to belong to. GraphMNL builds a branch library, identifies dominant and inferior branches via graph-aware reliability arbitration, gates unstable transfer, and applies target-preserving negative learning over non-target classes. This design decouples target supervision from branch guidance so that supervised losses learn the correct class, while Negative Learning suppresses unlikely alternatives when branch agreement is unreliable. Through the comprehensive experimental evaluation, GraphMNL achieves the best performance on Grocery datasets with 72.47% accuracy and 76.60 F1 score on Reddit M datasets.

翻译：多模态属性图（MAGs）通过整合图拓扑结构与文本、图像等异质性模态属性，从而能够对复杂关系系统进行更丰富的建模。然而，这种表达能力也使得基于MAG的学习依赖于多种语义来源，包括结构拓扑、文本和视觉属性，每种来源均可视为节点表示的一个分支。当不同分支在节点间的语义信息量和可靠性存在差异时，会出现节点级分支语义不平衡现象：一个能为某个节点提供判别性语义的分支，可能因模态质量或结构上下文的偏差而误导另一个节点。现有方法通常通过跨分支一致性或对齐来缓解这种异质性，隐含地将主导性预测视为可靠监督。当主导分支存在偏误时，强制模仿可能将其偏差传播至其他分支，并抑制原本对分类有用的原始语义。我们提出GraphMNL，一种图感知多模态负学习框架，通过将负学习作为跨分支指导来解决该问题。该框架不强制弱势分支模仿教师预测，而是教会模型节点不可能属于哪些类别。GraphMNL构建分支库，通过图感知可靠性仲裁识别主导分支与弱势分支，门控不稳定传输，并对非目标类别应用目标保持型负学习。该设计将目标监督与分支指导解耦，使得监督损失学习正确类别，而在分支一致性不可靠时，负学习抑制不可能的替代类别。通过全面的实验评估，GraphMNL在Grocery数据集上达到72.47%的准确率，在Reddit M数据集上达到76.60的F1分数，取得最优性能。