Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs

Graphs are widely used to encapsulate a variety of data formats, but real-world networks often involve complex node relations beyond only being pairwise. While hypergraphs and hierarchical graphs have been developed and employed to account for the complex node relations, they cannot fully represent these complexities in practice. Additionally, though many Graph Neural Networks (GNNs) have been proposed for representation learning on higher-order graphs, they are usually only evaluated on simple graph datasets. Therefore, there is a need for a unified modelling of higher-order graphs, and a collection of comprehensive datasets with an accessible evaluation framework to fully understand the performance of these algorithms on complex graphs. In this paper, we introduce the concept of hybrid graphs, a unified definition for higher-order graphs, and present the Hybrid Graph Benchmark (HGB). HGB contains 23 real-world hybrid graph datasets across various domains such as biology, social media, and e-commerce. Furthermore, we provide an extensible evaluation framework and a supporting codebase to facilitate the training and evaluation of GNNs on HGB. Our empirical study of existing GNNs on HGB reveals various research opportunities and gaps, including (1) evaluating the actual performance improvement of hypergraph GNNs over simple graph GNNs; (2) comparing the impact of different sampling strategies on hybrid graph learning methods; and (3) exploring ways to integrate simple graph and hypergraph information. We make our source code and full datasets publicly available at https://zehui127.github.io/hybrid-graph-benchmark/.

翻译：图被广泛用于封装多种数据格式，但现实网络往往涉及超越成对关系的复杂节点关联。尽管超图与层次图已被开发用于处理复杂节点关系，但它们在实践中仍无法完全表征这些复杂性。此外，虽然许多图神经网络（GNNs）已被提出用于高阶图的表示学习，但它们通常仅在简单图数据集上进行评估。因此，我们需要对高阶图进行统一建模，并建立包含全面数据集与可访问评估框架的集合，以充分理解这些算法在复杂图上的性能。本文提出混合图这一高阶图的统一定义，并构建混合图基准（HGB）。HGB包含23个来自生物、社交媒体和电子商务等不同领域的真实世界混合图数据集。此外，我们提供可扩展的评估框架与配套代码库，以支持GNNs在HGB上的训练与评估。我们对现有GNNs在HGB上进行的实证研究揭示了多种研究机遇与差距，包括：（1）评估超图GNN相比简单图GNN的实际性能提升；（2）比较不同采样策略对混合图学习方法的影响；（3）探索整合简单图与超图信息的方法。我们将源代码与完整数据集公开发布于https://zehui127.github.io/hybrid-graph-benchmark/。