Mind the Label Shift of Augmentation-based Graph OOD Generalization

Out-of-distribution (OOD) generalization is an important issue for Graph Neural Networks (GNNs). Recent works employ different graph editions to generate augmented environments and learn an invariant GNN for generalization. However, the label shift usually occurs in augmentation since graph structural edition inevitably alters the graph label. This brings inconsistent predictive relationships among augmented environments, which is harmful to generalization. To address this issue, we propose \textbf{LiSA}, which generates label-invariant augmentations to facilitate graph OOD generalization. Instead of resorting to graph editions, LiSA exploits \textbf{L}abel-\textbf{i}nvariant \textbf{S}ubgraphs of the training graphs to construct \textbf{A}ugmented environments. Specifically, LiSA first designs the variational subgraph generators to extract locally predictive patterns and construct multiple label-invariant subgraphs efficiently. Then, the subgraphs produced by different generators are collected to build different augmented environments. To promote diversity among augmented environments, LiSA further introduces a tractable energy-based regularization to enlarge pair-wise distances between the distributions of environments. In this manner, LiSA generates diverse augmented environments with a consistent predictive relationship and facilitates learning an invariant GNN. Extensive experiments on node-level and graph-level OOD benchmarks show that LiSA achieves impressive generalization performance with different GNN backbones. Code is available on \url{https://github.com/Samyu0304/LiSA}.

翻译：分布外泛化（Out-of-distribution, OOD）是图神经网络（Graph Neural Networks, GNNs）的一个重要问题。近期研究采用不同的图编辑手段生成增强环境，并学习不变性GNN以实现泛化。然而，由于图结构编辑不可避免地改变图标签，增强过程中常出现标签偏移。这导致增强环境间预测关系不一致，对泛化有害。为解决此问题，我们提出\textbf{LiSA}，通过生成标签不变的增强数据来促进图分布外泛化。不同于依赖图编辑，LiSA利用训练图的\textbf{L}abel-\textbf{i}nvariant \textbf{S}ubgraphs（标签不变子图）构建\textbf{A}ugmented environments（增强环境）。具体而言，LiSA首先设计变分子图生成器，高效提取局部预测模式并构建多个标签不变子图。然后，收集不同生成器产生的子图以构建多样化的增强环境。为促进增强环境的多样性，LiSA进一步引入基于能量的可解正则化项，扩大环境间分布的成对距离。通过这种方式，LiSA生成具有一致预测关系的多样化增强环境，并促进不变性GNN的学习。在节点级和图级分布外泛化基准上的大量实验表明，LiSA在不同GNN骨干网络下均能实现出色的泛化性能。代码开源地址：\url{https://github.com/Samyu0304/LiSA}。