Traditional machine learning methods heavily rely on the independent and identically distribution assumption, which imposes limitations when the test distribution deviates from the training distribution. To address this crucial issue, out-of-distribution (OOD) generalization, which aims to achieve satisfactory generalization performance when faced with unknown distribution shifts, has made a significant process. However, the OOD method for graph-structured data currently lacks clarity and remains relatively unexplored due to two primary challenges. Firstly, distribution shifts on graphs often occur simultaneously on node attributes and graph topology. Secondly, capturing invariant information amidst diverse distribution shifts proves to be a formidable challenge. To overcome these obstacles, in this paper, we introduce a novel framework, namely Graph Learning Invariant Domain genERation (GLIDER). The goal is to (1) diversify variations across domains by modeling the potential seen or unseen variations of attribute distribution and topological structure and (2) minimize the discrepancy of the variation in a representation space where the target is to predict semantic labels. Extensive experiment results indicate that our model outperforms baseline methods on node-level OOD generalization across domains in distribution shift on node features and topological structures simultaneously.
翻译:传统机器学习方法严重依赖于独立同分布假设,当测试分布偏离训练分布时,这一假设会带来局限性。为解决这一关键问题,面向未知分布偏移实现满意泛化性能的分布外(OOD)泛化方法已取得重要进展。然而,由于两大主要挑战,当前针对图结构数据的OOD方法仍缺乏明确性且相对未被充分探索:首先,图上的分布偏移往往同时发生在节点属性和图拓扑结构上;其次,在多种分布偏移中捕获不变信息是一项艰巨挑战。为克服这些障碍,本文提出一种新型框架——图学习不变域生成(GLIDER),旨在:(1)通过对属性分布和拓扑结构的潜在可见或不可见变体进行建模,实现域间变异的多样化;(2)在面向语义标签预测的目标表征空间中最小化变体的差异。大量实验结果表明,当节点特征与拓扑结构同时存在分布偏移时,本模型在跨域节点级OOD泛化上优于基线方法。