There have been tremendous efforts over the past decades dedicated to the generation of realistic graphs in a variety of domains, ranging from social networks to computer networks, from gene regulatory networks to online transaction networks. Despite the remarkable success, the vast majority of these works are unsupervised in nature and are typically trained to minimize the expected graph reconstruction loss, which would result in the representation disparity issue in the generated graphs, i.e., the protected groups (often minorities) contribute less to the objective and thus suffer from systematically higher errors. In this paper, we aim to tailor graph generation to downstream mining tasks by leveraging label information and user-preferred parity constraint. In particular, we start from the investigation of representation disparity in the context of graph generative models. To mitigate the disparity, we propose a fairness-aware graph generative model named FairGen. Our model jointly trains a label-informed graph generation module and a fair representation learning module by progressively learning the behaviors of the protected and unprotected groups, from the `easy' concepts to the `hard' ones. In addition, we propose a generic context sampling strategy for graph generative models, which is proven to be capable of fairly capturing the contextual information of each group with a high probability. Experimental results on seven real-world data sets, including web-based graphs, demonstrate that FairGen (1) obtains performance on par with state-of-the-art graph generative models across six network properties, (2) mitigates the representation disparity issues in the generated graphs, and (3) substantially boosts the model performance by up to 17% in downstream tasks via data augmentation.
翻译:过去几十年来,人们在多个领域(从社交网络到计算机网络,从基因调控网络到在线交易网络)投入了大量精力,致力于生成逼真的图结构。尽管取得了显著成功,但绝大多数现有工作本质上属于无监督学习,通常以最小化期望图重构损失为目标进行训练,这会导致生成图中出现表示差异问题——即受保护群体(通常是少数群体)对目标的贡献较小,因而遭受系统性更高的误差。本文旨在通过利用标签信息和用户偏好的公平性约束,使图生成过程能够适配下游挖掘任务。具体而言,我们首先探究图生成模型中表示差异的根源。为缓解这种差异,我们提出了一种名为FairGen的公平感知图生成模型。该模型通过从“简单”概念到“困难”概念的渐进式学习过程,联合训练一个标签引导的图生成模块和一个公平表示学习模块,以学习受保护与未受保护群体的行为模式。此外,我们提出了一种适用于图生成模型的通用上下文采样策略,该策略被证明能够以高概率公平捕获每个群体的上下文信息。在包括网络图在内的七个真实世界数据集上的实验结果表明,FairGen能够:(1)在六项网络属性指标上获得与最先进图生成模型相当的性能;(2)缓解生成图中的表示差异问题;(3)通过数据增强技术,在下游任务中使模型性能显著提升高达17%。