Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such attacks have privacy implications for data owners sharing their datasets to train machine learning models. Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large number of shadow models, which induces a large computational overhead. In this paper, we consider the setting of property inference attacks in which the attacker can poison a subset of the training dataset and query the trained target model. Motivated by our theoretical analysis of model confidences under poisoning, we design an efficient property inference attack, SNAP, which obtains higher attack success and requires lower amounts of poisoning than the state-of-the-art poisoning-based property inference attack by Mahloujifar et al. For example, on the Census dataset, SNAP achieves 34% higher success rate than Mahloujifar et al. while being 56.5x faster. We also extend our attack to infer whether a certain property was present at all during training and estimate the exact proportion of a property of interest efficiently. We evaluate our attack on several properties of varying proportions from four datasets and demonstrate SNAP's generality and effectiveness. An open-source implementation of SNAP can be found at https://github.com/johnmath/snap-sp23.
翻译:属性推断攻击允许攻击者从机器学习模型中提取训练数据集的全局属性。此类攻击对共享数据集以训练机器学习模型的数据所有者具有隐私影响。目前已有多种针对深度神经网络的属性推断攻击方法,但它们均依赖攻击者训练大量影子模型,这导致巨大的计算开销。本文考虑攻击者可对训练数据子集进行投毒并查询训练后目标模型的属性推断攻击场景。基于投毒条件下模型置信度的理论分析,我们设计了一种高效的属性推断攻击SNAP,该方法在攻击成功率上优于Mahloujifar等人提出的最先进的基于投毒的属性推断攻击,且所需投毒量更低。例如,在Census数据集上,SNAP的攻击成功率比Mahloujifar等人的方法高出34%,同时速度提升56.5倍。我们还扩展了该攻击以推断训练过程中某个属性是否出现过,并高效估算感兴趣属性的确切占比。我们使用四个数据集中不同比例的多个属性对攻击进行评估,证明了SNAP的通用性和有效性。SNAP的开源实现可在https://github.com/johnmath/snap-sp23获取。