Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs

Graphs are widely used to encapsulate a variety of data formats, but real-world networks often involve complex node relations beyond only being pairwise. While hypergraphs and hierarchical graphs have been developed and employed to account for the complex node relations, they cannot fully represent these complexities in practice. Additionally, though many Graph Neural Networks (GNNs) have been proposed for representation learning on higher-order graphs, they are usually only evaluated on simple graph datasets. Therefore, there is a need for a unified modelling of higher-order graphs, and a collection of comprehensive datasets with an accessible evaluation framework to fully understand the performance of these algorithms on complex graphs. In this paper, we introduce the concept of hybrid graphs, a unified definition for higher-order graphs, and present the Hybrid Graph Benchmark (HGB). HGB contains 23 real-world hybrid graph datasets across various domains such as biology, social media, and e-commerce. Furthermore, we provide an extensible evaluation framework and a supporting codebase to facilitate the training and evaluation of GNNs on HGB. Our empirical study of existing GNNs on HGB reveals various research opportunities and gaps, including (1) evaluating the actual performance improvement of hypergraph GNNs over simple graph GNNs; (2) comparing the impact of different sampling strategies on hybrid graph learning methods; and (3) exploring ways to integrate simple graph and hypergraph information. We make our source code and full datasets publicly available at https://zehui127.github.io/hybrid-graph-benchmark/.

翻译：图被广泛用于封装多种数据格式，但现实网络中的节点关系往往超越简单配对，涉及复杂节点关系。尽管超图与层次图已被开发并用于表示复杂节点关系，但它们在实际中无法完全体现这些复杂性。此外，尽管已提出众多图神经网络（GNN）用于高阶图的表示学习，但这些方法通常仅针对简单图数据集进行评估。因此，亟需一种高阶图的统一建模方法，以及包含全面数据集与可访问评估框架的集合，以全面理解这些算法在复杂图上的性能。本文提出混合图概念——一种高阶图的统一定义，并构建混合图基准（HGB）。HGB包含来自生物学、社交媒体及电子商务等不同领域的23个真实混合图数据集。此外，我们提供可扩展的评估框架及配套代码库，以支持在HGB上训练和评估GNN。我们基于现有GNN在HGB上的实证研究揭示了多种研究机遇与不足，包括：（1）评估超图GNN相较于简单图GNN的实际性能提升；（2）比较不同采样策略对混合图学习方法的影响；（3）探索整合简单图与超图信息的途径。我们在https://zehui127.github.io/hybrid-graph-benchmark/ 公开提供源代码与全部数据集。