As a solution concept in cooperative game theory, Shapley value is highly recognized in model interpretability studies and widely adopted by the leading Machine Learning as a Service (MLaaS) providers, such as Google, Microsoft, and IBM. However, as the Shapley value-based model interpretability methods have been thoroughly studied, few researchers consider the privacy risks incurred by Shapley values, despite that interpretability and privacy are two foundations of machine learning (ML) models. In this paper, we investigate the privacy risks of Shapley value-based model interpretability methods using feature inference attacks: reconstructing the private model inputs based on their Shapley value explanations. Specifically, we present two adversaries. The first adversary can reconstruct the private inputs by training an attack model based on an auxiliary dataset and black-box access to the model interpretability services. The second adversary, even without any background knowledge, can successfully reconstruct most of the private features by exploiting the local linear correlations between the model inputs and outputs. We perform the proposed attacks on the leading MLaaS platforms, i.e., Google Cloud, Microsoft Azure, and IBM aix360. The experimental results demonstrate the vulnerability of the state-of-the-art Shapley value-based model interpretability methods used in the leading MLaaS platforms and highlight the significance and necessity of designing privacy-preserving model interpretability methods in future studies. To our best knowledge, this is also the first work that investigates the privacy risks of Shapley values.
翻译:作为合作博弈论中的一种解概念,Shapley值在模型可解释性研究中受到高度认可,并被Google、Microsoft和IBM等领先的机器学习即服务(MLaaS)提供商广泛采用。然而,尽管基于Shapley值的模型可解释性方法已得到深入研究,却鲜有研究者关注Shapley值可能引发的隐私风险,尽管可解释性与隐私性同为机器学习模型的两大基础。本文通过特征推断攻击——即根据Shapley值解释重构私有模型输入——来探究基于Shapley值的模型可解释性方法存在的隐私风险。具体而言,我们提出了两种攻击者模型。第一种攻击者可通过基于辅助数据集训练攻击模型,并结合对模型可解释性服务的黑盒访问来重构私有输入。第二种攻击者即使没有任何背景知识,也能通过利用模型输入与输出之间的局部线性相关性成功重构大部分私有特征。我们在领先的MLaaS平台(即Google Cloud、Microsoft Azure和IBM aix360)上实施了所提出的攻击。实验结果表明,当前主流MLaaS平台采用的先进基于Shapley值的模型可解释性方法存在脆弱性,这凸显了在未来研究中设计隐私保护型模型可解释性方法的重要性和必要性。据我们所知,本研究也是首个探究Shapley值隐私风险的工作。