We study the per-datum Membership Inference Attacks (MIAs), where an attacker aims to infer whether a fixed target datum has been included in the input dataset of an algorithm and thus, violates privacy. First, we define the membership leakage of a datum as the advantage of the optimal adversary targeting to identify it. Then, we quantify the per-datum membership leakage for the empirical mean, and show that it depends on the Mahalanobis distance between the target datum and the data-generating distribution. We further assess the effect of two privacy defences, i.e. adding Gaussian noise and sub-sampling. We quantify exactly how both of them decrease the per-datum membership leakage. Our analysis builds on a novel proof technique that combines an Edgeworth expansion of the likelihood ratio test and a Lindeberg-Feller central limit theorem. Our analysis connects the existing likelihood ratio and scalar product attacks, and also justifies different canary selection strategies used in the privacy auditing literature. Finally, our experiments demonstrate the impacts of the leakage score, the sub-sampling ratio and the noise scale on the per-datum membership leakage as indicated by the theory.
翻译:我们研究逐数据的成员推理攻击(Membership Inference Attacks, MIAs),其中攻击者旨在推断某个固定目标数据是否已被包含在算法的输入数据集中,从而侵犯隐私。首先,我们将某个数据的成员关系泄露定义为针对该数据的优化对手所能获得的优势。接着,我们量化了经验均值的逐数据成员关系泄露,并表明其取决于目标数据与数据生成分布之间的马氏距离。我们进一步评估了两种隐私防御措施(即添加高斯噪声和子采样)的效果,精确量化了它们如何降低逐数据成员关系泄露。我们的分析基于一种新颖的证明技术,该技术结合了似然比检验的埃奇沃斯展开和林德伯格-费勒中心极限定理。我们的分析将现有似然比攻击和标量积攻击联系起来,并为隐私审计文献中使用的不同金丝雀选择策略提供了理论依据。最后,我们的实验展示了理论所指示的泄露分数、子采样比率和噪声尺度对逐数据成员关系泄露的影响。