Invariant graph representation learning aims to learn the invariance among data from different environments for out-of-distribution generalization on graphs. As the graph environment partitions are usually expensive to obtain, augmenting the environment information has become the de facto approach. However, the usefulness of the augmented environment information has never been verified. In this work, we find that it is fundamentally impossible to learn invariant graph representations via environment augmentation without additional assumptions. Therefore, we develop a set of minimal assumptions, including variation sufficiency and variation consistency, for feasible invariant graph learning. We then propose a new framework Graph invAriant Learning Assistant (GALA). GALA incorporates an assistant model that needs to be sensitive to graph environment changes or distribution shifts. The correctness of the proxy predictions by the assistant model hence can differentiate the variations in spurious subgraphs. We show that extracting the maximally invariant subgraph to the proxy predictions provably identifies the underlying invariant subgraph for successful OOD generalization under the established minimal assumptions. Extensive experiments on datasets including DrugOOD with various graph distribution shifts confirm the effectiveness of GALA.
翻译:不变图表示学习旨在从不同环境的数据中学习不变性,以实现图上的分布外泛化。由于图环境划分通常代价高昂,增强环境信息已成为一种常规方法。然而,增强环境信息的有效性从未得到验证。在本工作中,我们发现如果没有额外假设,通过环境增强来学习不变图表示在根本上是不可能的。因此,我们建立了一组最小假设,包括变化充分性和变化一致性,以实现可行的不变图学习。我们随后提出一个新框架——图不变学习助手(GALA)。GALA 引入了一个辅助模型,该模型需要对图环境变化或分布偏移保持敏感。辅助模型代理预测的正确性因此能够区分虚假子图中的变化。我们证明,在已建立的最小假设下,提取对代理预测具有最大不变性的子图能够可靠地识别潜在的不变子图,从而成功实现分布外泛化。在包含各种图分布偏移的 DrugOOD 等数据集上的大量实验证实了 GALA 的有效性。