Privacy and interpretability are two important ingredients for achieving trustworthy machine learning. We study the interplay of these two aspects in graph machine learning through graph reconstruction attacks. The goal of the adversary here is to reconstruct the graph structure of the training data given access to model explanations. Based on the different kinds of auxiliary information available to the adversary, we propose several graph reconstruction attacks. We show that additional knowledge of post-hoc feature explanations substantially increases the success rate of these attacks. Further, we investigate in detail the differences between attack performance with respect to three different classes of explanation methods for graph neural networks: gradient-based, perturbation-based, and surrogate model-based methods. While gradient-based explanations reveal the most in terms of the graph structure, we find that these explanations do not always score high in utility. For the other two classes of explanations, privacy leakage increases with an increase in explanation utility. Finally, we propose a defense based on a randomized response mechanism for releasing the explanations, which substantially reduces the attack success rate. Our code is available at https://github.com/iyempissy/graph-stealing-attacks-with-explanation
翻译:隐私与可解释性是实现可信机器学习的两大关键要素。我们通过图重构攻击研究图机器学习中这两个方面的相互作用。在此类攻击中,攻击者的目标是通过访问模型解释来重构训练数据的图结构。基于攻击者可获取的不同种类辅助信息,我们提出了多种图重构攻击方法。研究表明,对事后特征解释的额外了解显著提高了这些攻击的成功率。进一步地,我们详细考察了三类图神经网络解释方法在攻击性能上的差异:基于梯度的方法、基于扰动的方法和基于代理模型的方法。尽管基于梯度的解释最能揭示图结构信息,但我们发现此类解释并不总是具备高可用性。对于另外两类解释,隐私泄露程度随解释可用性的提升而增加。最后,我们提出了一种基于随机响应机制的防御策略,通过控制解释信息的发布,显著降低了攻击成功率。相关代码已在 https://github.com/iyempissy/graph-stealing-attacks-with-explanation 开源。