Fair graph learning plays a pivotal role in numerous practical applications. Recently, many fair graph learning methods have been proposed; however, their evaluation often relies on poorly constructed semi-synthetic datasets or substandard real-world datasets. In such cases, even a basic Multilayer Perceptron (MLP) can outperform Graph Neural Networks (GNNs) in both utility and fairness. In this work, we illustrate that many datasets fail to provide meaningful information in the edges, which may challenge the necessity of using graph structures in these problems. To address these issues, we develop and introduce a collection of synthetic, semi-synthetic, and real-world datasets that fulfill a broad spectrum of requirements. These datasets are thoughtfully designed to include relevant graph structures and bias information crucial for the fair evaluation of models. The proposed synthetic and semi-synthetic datasets offer the flexibility to create data with controllable bias parameters, thereby enabling the generation of desired datasets with user-defined bias values with ease. Moreover, we conduct systematic evaluations of these proposed datasets and establish a unified evaluation approach for fair graph learning models. Our extensive experimental results with fair graph learning methods across our datasets demonstrate their effectiveness in benchmarking the performance of these methods. Our datasets and the code for reproducing our experiments are available at https://github.com/XweiQ/Benchmark-GraphFairness.
翻译:公平图学习在众多实际应用中具有关键作用。近年来,虽已提出多种公平图学习方法,但其评估通常依赖构建不佳的半合成数据集或质量不达标的真实世界数据集。在此类情况下,即便基础的多层感知器(MLP)在效用性与公平性方面也能超越图神经网络(GNN)。本研究揭示了大量数据集的边结构无法提供有效信息,对在图问题中使用图结构的必要性提出质疑。为解决上述问题,我们开发并引入了一系列涵盖合成、半合成及真实世界的数据集,这些数据集满足广泛需求,经过精心设计以包含对模型公平性评估至关重要的相关图结构与偏差信息。所提出的合成与半合成数据集可灵活生成具有可控偏差参数的数据,从而便捷地创建用户指定偏差值的所需数据集。此外,我们对这些数据集进行了系统性评估,并建立了公平图学习模型的统一评估方法。通过使用公平图学习方法,我们在所提数据集上进行了广泛实验,结果证明这些数据集能有效评估方法性能。相关数据集及实验复现代码已开源至 https://github.com/XweiQ/Benchmark-GraphFairness。