Graph augmentation methods play a crucial role in improving the performance and enhancing generalisation capabilities in Graph Neural Networks (GNNs). Existing graph augmentation methods mainly perturb the graph structures and are usually limited to pairwise node relations. These methods cannot fully address the complexities of real-world large-scale networks that often involve higher-order node relations beyond only being pairwise. Meanwhile, real-world graph datasets are predominantly modelled as simple graphs, due to the scarcity of data that can be used to form higher-order edges. Therefore, reconfiguring the higher-order edges as an integration into graph augmentation strategies lights up a promising research path to address the aforementioned issues. In this paper, we present Hyperedge Augmentation (HyperAug), a novel graph augmentation method that constructs virtual hyperedges directly form the raw data, and produces auxiliary node features by extracting from the virtual hyperedge information, which are used for enhancing GNN performances on downstream tasks. We design three diverse virtual hyperedge construction strategies to accompany the augmentation scheme: (1) via graph statistics, (2) from multiple data perspectives, and (3) utilising multi-modality. Furthermore, to facilitate HyperAug evaluation, we provide 23 novel real-world graph datasets across various domains including social media, biology, and e-commerce. Our empirical study shows that HyperAug consistently and significantly outperforms GNN baselines and other graph augmentation methods, across a variety of application contexts, which clearly indicates that it can effectively incorporate higher-order node relations into graph augmentation methods for real-world complex networks.
翻译:图增强方法在提升图神经网络(GNN)性能与泛化能力方面具有关键作用。现有图增强方法主要对图结构进行扰动,且通常局限于节点对关系。这些方法无法完全应对真实世界大规模网络中普遍存在的超越成对关系的高阶节点关联。与此同时,由于构建高阶边所需数据的稀缺性,真实世界图数据集多被建模为简单图。因此,将高阶边重构融入图增强策略为解决上述问题开辟了富有前景的研究路径。本文提出超边增强方法(HyperAug),这是一种新型图增强方法:直接从原始数据构建虚拟超边,通过提取虚拟超边信息生成辅助节点特征,从而提升GNN在下游任务中的表现。我们设计三种多样化的虚拟超边构建策略以配合增强方案:(1)基于图统计特征,(2)从多数据视角出发,(3)利用多模态信息。此外,为便于评估HyperAug,我们提供涵盖社交媒体、生物学与电子商务等领域的23个新型真实世界图数据集。实验结果表明,HyperAug在多种应用场景中持续显著优于GNN基线模型及其他图增强方法,充分证明其能有效将高阶节点关联融入面向真实世界复杂网络的图增强方法。