Privacy attacks on Machine Learning (ML) models often focus on inferring the existence of particular data points in the training data. However, what the adversary really wants to know is if a particular individual's (subject's) data was included during training. In such scenarios, the adversary is more likely to have access to the distribution of a particular subject than actual records. Furthermore, in settings like cross-silo Federated Learning (FL), a subject's data can be embodied by multiple data records that are spread across multiple organizations. Nearly all of the existing private FL literature is dedicated to studying privacy at two granularities -- item-level (individual data records), and user-level (participating user in the federation), neither of which apply to data subjects in cross-silo FL. This insight motivates us to shift our attention from the privacy of data records to the privacy of data subjects, also known as subject-level privacy. We propose two novel black-box attacks for subject membership inference, of which one assumes access to a model after each training round. Using these attacks, we estimate subject membership inference risk on real-world data for single-party models as well as FL scenarios. We find our attacks to be extremely potent, even without access to exact training records, and using the knowledge of membership for a handful of subjects. To better understand the various factors that may influence subject privacy risk in cross-silo FL settings, we systematically generate several hundred synthetic federation configurations, varying properties of the data, model design and training, and the federation itself. Finally, we investigate the effectiveness of Differential Privacy in mitigating this threat.
翻译:机器学习(ML)模型的隐私攻击通常聚焦于推断训练数据中是否存在特定数据点。然而,攻击者真正想知道的是某个特定个体(主体)的数据是否包含在训练过程中。在此类场景中,攻击者更可能获取特定主体的数据分布而非实际记录。此外,在跨孤岛联邦学习(FL)等设置中,主体的数据可能分散在多个机构中,由多条数据记录共同体现。现有联邦学习隐私文献几乎全部致力于研究两种粒度的隐私——条目级(单条数据记录)和用户级(联邦中的参与用户),这两种粒度均不适用于跨孤岛联邦学习中的数据主体。这一认知促使我们将关注点从数据记录隐私转向数据主体隐私,即主体级隐私。我们提出两种新型黑盒攻击方法用于主体成员推断,其中一种假设攻击者在每轮训练后都能访问模型。利用这些攻击,我们评估了单方模型和联邦学习场景下真实数据的主体成员推断风险。研究发现,即使无法获取精确训练记录,且仅掌握少量主体的成员身份信息,我们的攻击仍具有极强的效力。为深入理解跨孤岛联邦学习场景中影响主体隐私风险的多重因素,我们系统生成了数百种合成联邦配置,涵盖数据属性、模型设计与训练参数以及联邦架构本身的变化。最后,我们探究了差分隐私在缓解这一威胁方面的有效性。