Membership inference (MI) attack is currently the most popular test for measuring privacy leakage in machine learning models. Given a machine learning model, a data point and some auxiliary information, the goal of an MI attack is to determine whether the data point was used to train the model. In this work, we study the reliability of membership inference attacks in practice. Specifically, we show that a model owner can plausibly refute the result of a membership inference test on a data point $x$ by constructing a proof of repudiation that proves that the model was trained without $x$. We design efficient algorithms to construct proofs of repudiation for all data points of the training dataset. Our empirical evaluation demonstrates the practical feasibility of our algorithm by constructing proofs of repudiation for popular machine learning models on MNIST and CIFAR-10. Consequently, our results call for a re-evaluation of the implications of membership inference attacks in practice.
翻译:成员推断(Membership Inference,MI)攻击是目前衡量机器学习模型隐私泄露最流行的测试方法。给定一个机器学习模型、一个数据点以及一些辅助信息,MI攻击的目标是判断该数据点是否被用于训练该模型。在本工作中,我们研究了成员推断攻击在实际中的可靠性。具体来说,我们表明模型所有者可以通过构建一个反驳证明(proof of repudiation)来看似合理地质疑对数据点$x$的成员推断测试结果,该证明表明模型是在没有$x$的情况下训练的。我们设计了高效的算法来为训练数据集中的所有数据点构建反驳证明。我们的实证评估通过在MNIST和CIFAR-10上为流行的机器学习模型构建反驳证明,证明了我们算法的实际可行性。因此,我们的结果呼吁重新评估成员推断攻击在实际中的影响。