Synthetic data is emerging as the most promising solution to share individual-level data while safeguarding privacy. Membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data. These attacks, however, currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. This often is a very strong assumption that would make an attack unlikely to happen in practice. We here show how this assumption can be removed and how MIAs can be performed using only the synthetic data. More specifically, in three different attack scenarios using only synthetic data, our results demonstrate that MIAs are still successful, across two real-world datasets and two synthetic data generators. These results show how the strong hypothesis made when auditing synthetic data releases - access to an auxiliary dataset - can be relaxed to perform an actual attack.
翻译:合成数据正成为共享个体级数据同时保护隐私的最有前景的解决方案。基于影子建模的成员推断攻击已成为评估合成数据隐私的标准方法。然而,这些攻击目前假设攻击者能够访问来自与训练数据分布相似的辅助数据集。这一假设往往非常强,使得攻击在实践中难以发生。我们在此展示了如何消除这一假设,并仅利用合成数据进行成员推断攻击。具体而言,在三种仅使用合成数据的攻击场景中,我们的结果表明,成员推断攻击在两个真实世界数据集和两个合成数据生成器上仍然能够成功。这些结果表明,在审计合成数据发布时做出的强假设——即能够访问辅助数据集——可以被放宽以实施实际攻击。