Fair graph learning plays a pivotal role in numerous practical applications. Recently, many fair graph learning methods have been proposed; however, their evaluation often relies on poorly constructed semi-synthetic datasets or substandard real-world datasets. In such cases, even a basic Multilayer Perceptron (MLP) can outperform Graph Neural Networks (GNNs) in both utility and fairness. In this work, we illustrate that many datasets fail to provide meaningful information in the edges, which may challenge the necessity of using graph structures in these problems. To address these issues, we develop and introduce a collection of synthetic, semi-synthetic, and real-world datasets that fulfill a broad spectrum of requirements. These datasets are thoughtfully designed to include relevant graph structures and bias information crucial for the fair evaluation of models. The proposed synthetic and semi-synthetic datasets offer the flexibility to create data with controllable bias parameters, thereby enabling the generation of desired datasets with user-defined bias values with ease. Moreover, we conduct systematic evaluations of these proposed datasets and establish a unified evaluation approach for fair graph learning models. Our extensive experimental results with fair graph learning methods across our datasets demonstrate their effectiveness in benchmarking the performance of these methods. Our datasets and the code for reproducing our experiments are available at https://github.com/XweiQ/Benchmark-GraphFairness.
翻译:公平图学习在众多实际应用中扮演着关键角色。近年来,许多公平图学习方法被提出;然而,其评估往往依赖于构建不当的半合成数据集或质量欠佳的真实世界数据集。在此类情况下,即使是基础的多层感知机(MLP)也能在效用和公平性方面超越图神经网络(GNNs)。本工作中,我们阐明许多数据集未能提供有意义的边信息,这可能挑战在这些问题中使用图结构的必要性。为解决这些问题,我们开发并引入了一套满足广泛需求的合成、半合成及真实世界数据集。这些数据集经过精心设计,包含了对于模型公平评估至关重要的相关图结构和偏置信息。所提出的合成与半合成数据集提供了创建具有可控偏置参数数据的灵活性,从而能够轻松生成具有用户定义偏置值的期望数据集。此外,我们对这些提出的数据集进行了系统评估,并为公平图学习模型建立了统一的评估方法。我们在所提数据集上对公平图学习方法进行的广泛实验结果,证明了这些数据集在基准测试这些方法性能方面的有效性。我们的数据集及复现实验的代码可在 https://github.com/XweiQ/Benchmark-GraphFairness 获取。