There have been tremendous efforts over the past decades dedicated to the generation of realistic graphs in a variety of domains, ranging from social networks to computer networks, from gene regulatory networks to online transaction networks. Despite the remarkable success, the vast majority of these works are unsupervised in nature and are typically trained to minimize the expected graph reconstruction loss, which would result in the representation disparity issue in the generated graphs, i.e., the protected groups (often minorities) contribute less to the objective and thus suffer from systematically higher errors. In this paper, we aim to tailor graph generation to downstream mining tasks by leveraging label information and user-preferred parity constraints. In particular, we start from the investigation of representation disparity in the context of graph generative models. To mitigate the disparity, we propose a fairness-aware graph generative model named FairGen. Our model jointly trains a label-informed graph generation module and a fair representation learning module by progressively learning the behaviors of the protected and unprotected groups, from the `easy' concepts to the `hard' ones. In addition, we propose a generic context sampling strategy for graph generative models, which is proven to be capable of fairly capturing the contextual information of each group with a high probability. Experimental results on seven real-world data sets, including web-based graphs, demonstrate that FairGen (1) obtains performance on par with state-of-the-art graph generative models across nine network properties, (2) mitigates the representation disparity issues in the generated graphs, and (3) substantially boosts the model performance by up to 17% in downstream tasks via data augmentation.
翻译:过去几十年来,人们在从社交网络到计算机网络、从基因调控网络到在线交易网络等多个领域中,为生成逼真的图付出了巨大努力。尽管取得了显著成功,但绝大多数这些工作在本质上都是无监督的,通常被训练以最小化期望的图重构损失,这会导致生成图中出现表示差异问题——即受保护群体(通常是少数群体)对目标的贡献较小,因此会遭受系统性更高的误差。在本文中,我们旨在通过利用标签信息和用户偏好的平等约束,将图生成调整为适应下游挖掘任务。具体而言,我们从研究图生成模型中表示差异的根源入手。为了缓解这种差异,我们提出了一种名为公平生成的公平感知图生成模型。我们的模型通过从“简单”概念到“困难”概念逐步学习受保护群体和未受保护群体的行为,联合训练了一个标签信息驱动的图生成模块和一个公平表示学习模块。此外,我们提出了一种适用于图生成模型的通用上下文采样策略,该策略被证明能够以高概率公平地捕获每个群体的上下文信息。在包括基于网页的图在内的七个真实世界数据集上的实验结果表明,公平生成(1)在九个网络属性上获得了与最先进图生成模型相当的性能,(2)缓解了生成图中的表示差异问题,以及(3)通过数据增强在下游任务中显著提升了模型性能高达17%。