Membership Inference Attacks (MIAs) are widely used to quantify training data memorization and assess privacy risks. Standard evaluation requires repeated retraining, which is computationally costly for large models. One-run methods (single training with randomized data inclusion) and zero-run methods (post hoc evaluation) are often used instead, though their statistical validity remains unclear. To address this gap, we frame MIA evaluation as a causal inference problem, defining memorization as the causal effect of including a data point in the training set. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations popular for LLMs are confounded by non-random membership assignment. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. Experiments on real-world data show that our approach enables reliable memorization measurement even when retraining is impractical and under distribution shift, providing a principled foundation for privacy evaluation in modern AI systems.
翻译:成员推理攻击(MIAs)被广泛用于量化训练数据记忆程度并评估隐私风险。标准评估方法需要重复重新训练,对于大型模型而言计算成本高昂。单次运行方法(通过随机数据包含进行单次训练)和零次运行方法(事后评估)常被用作替代方案,但其统计有效性仍不明确。为填补这一空白,我们将MIA评估构建为因果推理问题,将记忆定义为将数据点纳入训练集所产生的因果效应。这一新颖的框架揭示并形式化了现有协议中的关键偏差来源:单次运行方法受到联合包含数据点间干扰效应的影响,而大型语言模型中流行的零次运行评估则因非随机成员分配而产生混淆。我们推导了标准MIA指标的因果对应形式,并为多轮运行、单次运行及零次运行机制提出了具有非渐近一致性保证的实用估计量。在真实数据上的实验表明,即使在实际无法重新训练及存在分布偏移的情况下,我们的方法仍能实现可靠的记忆度量,为现代人工智能系统的隐私评估提供了理论基础。