Learning graph embeddings is a crucial task in graph mining tasks. An effective graph embedding model can learn low-dimensional representations from graph-structured data for data publishing benefiting various downstream applications such as node classification, link prediction, etc. However, recent studies have revealed that graph embeddings are susceptible to attribute inference attacks, which allow attackers to infer private node attributes from the learned graph embeddings. To address these concerns, privacy-preserving graph embedding methods have emerged, aiming to simultaneously consider primary learning and privacy protection through adversarial learning. However, most existing methods assume that representation models have access to all sensitive attributes in advance during the training stage, which is not always the case due to diverse privacy preferences. Furthermore, the commonly used adversarial learning technique in privacy-preserving representation learning suffers from unstable training issues. In this paper, we propose a novel approach called Private Variational Graph AutoEncoders (PVGAE) with the aid of independent distribution penalty as a regularization term. Specifically, we split the original variational graph autoencoder (VGAE) to learn sensitive and non-sensitive latent representations using two sets of encoders. Additionally, we introduce a novel regularization to enforce the independence of the encoders. We prove the theoretical effectiveness of regularization from the perspective of mutual information. Experimental results on three real-world datasets demonstrate that PVGAE outperforms other baselines in private embedding learning regarding utility performance and privacy protection.
翻译:学习图嵌入是图挖掘任务中的关键环节。有效的图嵌入模型能够从图结构数据中学习低维表示,为数据发布提供支持,从而惠及节点分类、链接预测等各类下游应用。然而,近期研究表明图嵌入易受属性推断攻击,攻击者可从已学习的图嵌入中推断出私有节点属性。为应对这一挑战,隐私保护图嵌入方法应运而生,旨在通过对抗学习同时兼顾主任务学习与隐私保护。但现有方法大多假设表示模型在训练阶段能预先获取全部敏感属性,而用户隐私偏好的多样性使得这一假设常不成立。此外,隐私保护表示学习中广泛使用的对抗学习技术存在训练不稳定的问题。本文提出一种名为私有变分图自编码器(PVGAE)的新方法,通过引入独立分布惩罚项作为正则化手段。具体而言,我们将原始变分图自编码器(VGAE)拆分为两组编码器,分别学习敏感与非敏感潜表示。同时,我们提出一种新型正则化方法以强制编码器保持独立性,并从互信息角度证明了该正则化的理论有效性。在三个真实数据集上的实验结果表明,PVGAE在隐私嵌入学习的效用表现与隐私保护能力方面均优于其他基线方法。