Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data. It can be interpreted as a bound on adversary's capability to distinguish two adjacent datasets according to chosen adjacency relation. In practice, most DP implementations use the add/remove adjacency relation, where two datasets are adjacent if one can be obtained from the other by adding or removing a single record, thereby protecting membership. In many ML applications, however, the goal is to protect attributes of individual records (e.g., labels used in supervised fine-tuning). We show that privacy accounting under add/remove overstates attribute privacy compared to accounting under the substitute adjacency relation, which permits substituting one record. To demonstrate this gap, we develop novel attacks to audit DP under substitute adjacency, and show empirically that audit results are inconsistent with DP guarantees reported under add/remove, yet remain consistent with the budget accounted under the substitute adjacency relation. Our results highlight that the choice of adjacency when reporting DP guarantees is critical when the protection target is per-record attributes rather than membership.
翻译:采用差分隐私训练机器学习模型能够限制攻击者推断训练数据中敏感信息的能力。这可以被解释为根据选定的邻接关系,对攻击者区分两个相邻数据集的能力施加的约束。在实践中,大多数差分隐私实现采用“增删”邻接关系,即一个数据集可通过添加或移除单条记录得到另一个数据集时,两者相邻,从而保护成员关系。然而,在许多机器学习应用中,目标是保护单条记录的属性(例如,监督微调中使用的标签)。我们证明,与允许替换单条记录的“替换”邻接关系下的隐私核算相比,“增删”邻接关系下的隐私核算会夸大属性隐私。为揭示这一差距,我们开发了新的攻击方法,用于审计替换邻接关系下的差分隐私,并实验证明审计结果与在增删邻接关系下报告的差分隐私保证不一致,但与替换邻接关系下核算的预算一致。我们的结果强调,当保护目标是每条记录的属性而非成员关系时,报告差分隐私保证时邻接关系的选择至关重要。