Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. By synthesizing samples through the interpolation of features and labels, Mixup effectively addresses the issue of data scarcity. However, it has rarely been explored in graph learning tasks due to the irregularity and connectivity of graph data. Specifically, in node classification tasks, Mixup presents a challenge in creating connections for synthetic data. In this paper, we propose Geometric Mixup (GeoMix), a simple and interpretable Mixup approach leveraging in-place graph editing. It effectively utilizes geometry information to interpolate features and labels with those from the nearby neighborhood, generating synthetic nodes and establishing connections for them. We conduct theoretical analysis to elucidate the rationale behind employing geometry information for node Mixup, emphasizing the significance of locality enhancement-a critical aspect of our method's design. Extensive experiments demonstrate that our lightweight Geometric Mixup achieves state-of-the-art results on a wide variety of standard datasets with limited labeled data. Furthermore, it significantly improves the generalization capability of underlying GNNs across various challenging out-of-distribution generalization tasks. Our code is available at https://github.com/WtaoZhao/geomix.
翻译:Mixup在缓解图像分类中因标注数据有限所带来的挑战方面已展现出显著成效。通过特征与标签的插值合成样本,Mixup有效应对了数据稀缺问题。然而,由于图数据的不规则性和连通性特性,该方法在图学习任务中鲜有探索。具体而言,在节点分类任务中,Mixup在构建合成数据的连接关系方面存在挑战。本文提出几何混合(GeoMix),一种基于原位图编辑的简洁且可解释的Mixup方法。该方法充分利用几何信息,将当前节点特征与标签同邻近邻域进行插值,从而生成合成节点并为其建立连接。我们通过理论分析阐释了利用几何信息进行节点混合的内在机理,着重强调了局部性增强的重要性——这正是我们方法设计的关键所在。大量实验表明,这种轻量化的几何混合方法在多种标注数据有限的标准数据集上取得了最先进的性能。此外,该方法显著提升了底层图神经网络在各种具有挑战性的分布外泛化任务中的泛化能力。代码已开源:https://github.com/WtaoZhao/geomix。