The skip-gram model (SGM), which employs a neural network to generate node vectors, serves as the basis for numerous popular graph embedding techniques. However, since the training datasets contain sensitive linkage information, the parameters of a released SGM may encode private information and pose significant privacy risks. Differential privacy (DP) is a rigorous standard for protecting individual privacy in data analysis. Nevertheless, when applying differential privacy to skip-gram in graphs, it becomes highly challenging due to the complex link relationships, which potentially result in high sensitivity and necessitate substantial noise injection. To tackle this challenge, we present AdvSGM, a differentially private skip-gram for graphs via adversarial training. Our core idea is to leverage adversarial training to privatize skip-gram while improving its utility. Towards this end, we develop a novel adversarial training module by devising two optimizable noise terms that correspond to the parameters of a skip-gram. By fine-tuning the weights between modules within AdvSGM, we can achieve differentially private gradient updates without additional noise injection. Extensive experimental results on six real-world graph datasets show that AdvSGM preserves high data utility across different downstream tasks.
翻译:Skip-gram模型(SGM)通过神经网络生成节点向量,是众多主流图嵌入技术的基础。然而,由于训练数据集中包含敏感的链接信息,已发布的SGM参数可能编码隐私信息并带来显著的隐私风险。差分隐私(DP)是数据分析中保护个体隐私的严格标准。然而,将差分隐私应用于图结构中的skip-gram模型时,由于复杂的链接关系会带来高敏感度并需要注入大量噪声,实现过程极具挑战性。为应对这一挑战,我们提出AdvSGM——一种通过对抗训练实现差分隐私的图skip-gram模型。我们的核心思想是利用对抗训练在实现skip-gram隐私化的同时提升其实用性。为此,我们设计了一个创新的对抗训练模块,通过构建两个与skip-gram参数对应的可优化噪声项。通过微调AdvSGM内部模块间的权重,我们能够在无需额外注入噪声的情况下实现差分隐私梯度更新。在六个真实世界图数据集上的大量实验结果表明,AdvSGM在不同下游任务中均能保持较高的数据实用性。