Membership inference attacks (MIAs) are popular methods for empirically assessing the leakage of sensitive information in the training data through models or statistics learned from the data. The MIA vulnerability is often evaluated through false positive rate (FPR) and true positive rate (TPR) of a binary classifier that tries to predict whether a particular sample was in the training data. However, in order to reliably estimate the TPR especially for low FPR values, a lot of observations are needed, which in case of MIA translates to many target models, leading to large computational cost. To avoid excessive compute requirements, the MIA scores are often averaged over multiple individuals and multiple targeted models. We demonstrate two key weaknesses in this efficient MIA evaluation pipeline. First, we show that evaluating the TPR based on MIA scores concatenated across multiple individuals, commonly used to study vulnerabilities in the very low FPR regime, is not calibrated across the per-sample FPRs. This makes it unreliable as a tool for auditing differential privacy. To solve this, we propose a post-processing method to effectively calibrate the FPR across different samples. Second, we identify a finite population bias in the commonly used efficient likelihood-ratio attack (LiRA) implementation proposed by Carlini et al. 2022, leading to a positive bias in the per-sample vulnerability.
翻译:成员推断攻击(MIA)是通过模型或从数据中习得的统计量,对训练数据中敏感信息泄露进行实证评估的常用方法。MIA脆弱性通常通过二分类器的假阳性率(FPR)和真阳性率(TPR)进行衡量,该分类器旨在预测特定样本是否属于训练数据。然而,为可靠估计TPR(尤其是低FPR值下的TPR),需要大量观测值——在MIA场景中即需大量目标模型,从而导致高昂的计算成本。为避免过高计算需求,MIA评分常需跨多个个体和多个目标模型取平均值。本研究揭示了这一高效MIA评估流程的两个关键缺陷。首先,我们证明基于跨多个个体拼接的MIA评分(常用于研究极低FPR区间脆弱性)来评估TPR时,并未实现跨样本FPR校准,这使得其作为差分隐私审计工具不可靠。为解决此问题,我们提出一种后处理方法以有效校准不同样本间的FPR。其次,我们识别出Carlini等人(2022年)提出的高效似然比攻击(LiRA)实现中存在的有限总体偏差,该偏差会导致样本级脆弱性评估产生正向偏倚。