Multimodal Graph Negative Learning

Multimodal attributed graphs (MAGs) integrate graph topology with heterogeneous modality attributes, such as text and images, thereby enabling richer modeling of complex relational systems. However, such expressiveness also makes learning on MAGs depend on multiple semantic sources, including structural topology, textual and visual attributes, each of which can be regarded as a branch for node representation. Node-level branch semantic imbalance arises when these branches differ across nodes in semantic informativeness and reliability: a branch that provides discriminative semantics for one node may mislead another due to bias in modality quality or structural context. Existing methods often mitigate such heterogeneity through cross-branch agreement or alignment, implicitly treating the dominant prediction as reliable supervision. When the dominant branch is biased, forced imitation may propagate its bias to other branches and suppress original semantics that are useful for classification. We propose GraphMNL, a graph-aware multimodal negative learning framework that addresses this issue by using Negative Learning as cross-branch guidance. Instead of forcing inferior branches to imitate a teacher prediction, the model teaches them which classes a node is unlikely to belong to. GraphMNL builds a branch library, identifies dominant and inferior branches via graph-aware reliability arbitration, gates unstable transfer, and applies target-preserving negative learning over non-target classes. This design decouples target supervision from branch guidance so that supervised losses learn the correct class, while Negative Learning suppresses unlikely alternatives when branch agreement is unreliable. Through the comprehensive experimental evaluation, GraphMNL achieves the best performance on Grocery datasets with 72.47% accuracy and 76.60 F1 score on Reddit M datasets.

翻译：多模态属性图（MAGs）通过融合图拓扑结构与异构模态属性（如文本和图像），从而实现对复杂关系系统的更丰富建模。然而，这种表达能力也使得基于MAGs的学习依赖于多个语义来源，包括结构拓扑、文本属性和视觉属性，每个来源均可视为节点表示的一个分支。当这些分支在语义信息性和可靠性上存在节点级差异时，会产生节点级分支语义不平衡：一个为某节点提供判别性语义的分支，可能因模态质量或结构上下文的偏差而误导另一节点。现有方法通常通过跨分支一致性或对齐来缓解这种异质性，隐式地将主导预测视为可靠监督。当主导分支存在偏差时，强制模仿可能将其偏差传播至其他分支，并抑制对分类有用的原始语义。本文提出GraphMNL，一种图感知多模态负学习框架，通过使用负学习作为跨分支指导来解决该问题。该方法并非强制弱势分支模仿教师预测，而是教导模型节点不可能属于哪些类别。GraphMNL构建分支库，通过图感知可靠性仲裁识别主导分支与弱势分支，门控不稳定传递，并对非目标类别应用目标保持负学习。该设计将目标监督与分支指导解耦，使监督损失学习正确类别，而在分支一致性不可靠时，负学习抑制不可能候选类别。通过广泛的实验评估，GraphMNL在Grocery数据集上达到72.47%的准确率，在Reddit M数据集上达到76.60的F1分数，取得了最优性能。