Few-Shot Causal Representation Learning for Out-of-Distribution Generalization on Heterogeneous Graphs

Heterogeneous graph few-shot learning (HGFL) has been developed to address the label sparsity issue in heterogeneous graphs (HGs), which consist of various types of nodes and edges. The core concept of HGFL is to extract knowledge from rich-labeled classes in a source HG, transfer this knowledge to a target HG to facilitate learning new classes with few-labeled training data, and finally make predictions on unlabeled testing data. Existing methods typically assume that the source HG, training data, and testing data all share the same distribution. However, in practice, distribution shifts among these three types of data are inevitable due to two reasons: (1) the limited availability of the source HG that matches the target HG distribution, and (2) the unpredictable data generation mechanism of the target HG. Such distribution shifts result in ineffective knowledge transfer and poor learning performance in existing methods, thereby leading to a novel problem of out-of-distribution (OOD) generalization in HGFL. To address this challenging problem, we propose a novel Causal OOD Heterogeneous graph Few-shot learning model, namely COHF. In COHF, we first characterize distribution shifts in HGs with a structural causal model, establishing an invariance principle for OOD generalization in HGFL. Then, following this invariance principle, we propose a new variational autoencoder-based heterogeneous graph neural network to mitigate the impact of distribution shifts. Finally, by integrating this network with a novel meta-learning framework, COHF effectively transfers knowledge to the target HG to predict new classes with few-labeled data. Extensive experiments on seven real-world datasets have demonstrated the superior performance of COHF over the state-of-the-art methods.

翻译：异构图少样本学习（HGFL）旨在解决由多种类型节点和边构成的异构图（HGs）中的标签稀疏性问题。其核心思想是从源异构图中充分标注的类别中提取知识，将该知识迁移至目标异构图以利用少量标注的训练数据学习新类别，并最终对未标注的测试数据进行预测。现有方法通常假设源异构图、训练数据和测试数据均服从相同分布。然而实际应用中，由于以下两个原因，这三类数据间的分布偏移不可避免：（1）与目标异构图分布匹配的源异构图的可用性有限；（2）目标异构图的不可预测数据生成机制。此类分布偏移导致现有方法知识迁移效率低下且学习性能不佳，由此引出了异构图文少样本学习中分布外（OOD）泛化的新问题。针对这一挑战性问题，我们提出了一种新颖的因果OOD异构图少样本学习模型COHF。该模型中，我们首先通过结构因果模型刻画异构图的分布偏移，建立了HGFL中OOD泛化的不变性原理；继而遵循该不变性原理，提出基于变分自编码器的新型异构图神经网络，以缓解分布偏移的影响；最后通过将该网络与新型元学习框架相结合，COHF能有效将知识迁移至目标异构图，利用少量标注数据预测新类别。在七个真实数据集上的大量实验表明，COHF的性能显著优于现有最先进方法。