Many real-world graphs frequently present challenges for graph learning due to the presence of both heterophily and heterogeneity. However, existing benchmarks for graph learning often focus on heterogeneous graphs with homophily or homogeneous graphs with heterophily, leaving a gap in understanding how methods perform on graphs that are both heterogeneous and heterophilic. To bridge this gap, we introduce H2GB, a novel graph benchmark that brings together the complexities of both the heterophily and heterogeneity properties of graphs. Our benchmark encompasses 9 diverse real-world datasets across 5 domains, 28 baseline model implementations, and 26 benchmark results. In addition, we present a modular graph transformer framework UnifiedGT and a new model variant, H2G-former, that excels at this challenging benchmark. By integrating masked label embeddings, cross-type heterogeneous attention, and type-specific FFNs, H2G-former effectively tackles graph heterophily and heterogeneity. Extensive experiments across 26 baselines on H2GB reveal inadequacies of current models on heterogeneous heterophilic graph learning, and demonstrate the superiority of our H2G-former over existing solutions. Both the benchmark and the framework are available on GitHub (https://github.com/junhongmit/H2GB) and PyPI (https://pypi.org/project/H2GB), and documentation can be found at https://junhongmit.github.io/H2GB/.
翻译:许多现实世界图因同时存在异配性与异质性而对图学习构成挑战。然而,现有图学习基准通常关注具有同配性的异构图或具有异配性的同构图,导致在理解方法在兼具异质性与异配性的图上的性能方面存在空白。为填补这一空白,我们提出了H2GB——一个融合图异配性与异质性双重复杂特性的新型图基准。该基准涵盖5个领域的9个多样化真实数据集、28个基线模型实现及26项基准测试结果。此外,我们提出了模块化图Transformer框架UnifiedGT及其新变体H2G-former,该模型在此挑战性基准上表现卓越。通过集成掩码标签嵌入、跨类型异质注意力机制与类型特定前馈网络,H2G-former能有效应对图异配性与异质性。在H2GB上对26个基线模型开展的广泛实验揭示了当前模型在异质异配图学习方面的不足,并证明了H2G-former相较于现有解决方案的优越性。本基准与框架已在GitHub(https://github.com/junhongmit/H2GB)和PyPI(https://pypi.org/project/H2GB)开源,相关文档可在https://junhongmit.github.io/H2GB/获取。