In this evolving era of machine learning security, membership inference attacks have emerged as a potent threat to the confidentiality of sensitive data. In this attack, adversaries aim to determine whether a particular point was used during the training of a target model. This paper proposes a new method to gauge a data point's membership in a model's training set. Instead of correlating loss with membership, as is traditionally done, we have leveraged the fact that training examples generally exhibit higher confidence values when classified into their actual class. During training, the model is essentially being 'fit' to the training data and might face particular difficulties in generalization to unseen data. This asymmetry leads to the model achieving higher confidence on the training data as it exploits the specific patterns and noise present in the training data. Our proposed approach leverages the confidence values generated by the machine learning model. These confidence values provide a probabilistic measure of the model's certainty in its predictions and can further be used to infer the membership of a given data point. Additionally, we also introduce another variant of our method that allows us to carry out this attack without knowing the ground truth(true class) of a given data point, thus offering an edge over existing label-dependent attack methods.
翻译:在机器学习安全不断演变的时代,会员推断攻击已成为对敏感数据保密性的重大威胁。此类攻击中,攻击者旨在确定特定数据点是否被用于目标模型的训练。本文提出了一种评估数据点在模型训练集中成员身份的新方法。与传统方法将损失与成员身份相关联不同,我们利用了训练样本在归入其真实类别时通常表现出更高置信度值这一事实。在训练过程中,模型实质上在'拟合'训练数据,可能对未见数据的泛化面临特定困难。这种不对称性导致模型利用训练数据中存在的特定模式与噪声,从而在训练数据上达到更高置信度。我们提出的方法借助机器学习模型生成的置信度值。这些置信度值提供了模型对其预测确定性的概率度量,并可进一步用于推断给定数据点的成员身份。此外,我们还引入了该方法的一种变体,允许在未知给定数据点真实类别的情况下实施攻击,从而在现有依赖标签的攻击方法中占据优势。