Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article provides theoretical guarantees by exploring the fundamental statistical limitations associated with MIAs on machine learning models. More precisely, we first derive the statistical quantity that governs the effectiveness and success of such attacks. We then deduce that in a very general regression setting with overfitting algorithms, attacks may have a high probability of success. Finally, we investigate several situations for which we provide bounds on this quantity of interest. Our results enable us to deduce the accuracy of potential attacks based on the number of samples and other structural parameters of learning models. In certain instances, these parameters can be directly estimated from the dataset.
翻译:成员推断攻击能够揭示特定数据点是否属于训练数据集,可能暴露个人敏感信息。本文通过探究机器学习模型中成员推断攻击的统计固有极限,提供了理论保证。具体而言,我们首先推导出控制此类攻击有效性与成功率的统计量,进而证明在采用过拟合算法的一般回归场景中,攻击具有高成功率。最后,我们研究了若干情境并为该关键统计量提供了界。基于样本数量及学习模型的结构参数,我们的研究结果能够推断潜在攻击的准确率。在特定情况下,这些参数可直接从数据集中估计得出。