Observational Auditing of Label Privacy

Differential privacy (DP) auditing is essential for evaluating privacy guarantees in machine learning systems. Existing auditing methods, however, pose a significant challenge for large-scale systems since they require modifying the training dataset -- for instance, by injecting out-of-distribution canaries or removing samples from training. Such interventions on the training data pipeline are resource-intensive and involve considerable engineering overhead. We introduce a novel observational auditing framework that leverages the inherent randomness of data distributions, enabling privacy evaluation without altering the original dataset. Our approach extends privacy auditing beyond traditional membership inference to protected attributes, with labels as a special case, addressing a key gap in existing techniques. We provide theoretical foundations for our method and perform experiments on Criteo and CIFAR-10 datasets that demonstrate its effectiveness in auditing label privacy guarantees. This work opens new avenues for practical privacy auditing in large-scale production environments.

翻译：差分隐私（DP）审计对于评估机器学习系统中的隐私保障至关重要。然而，现有的审计方法对大规模系统提出了重大挑战，因为它们需要修改训练数据集——例如，通过注入分布外样本或从训练中移除样本。此类对训练数据流程的干预不仅资源密集，还涉及大量的工程开销。我们提出了一种新颖的观测式审计框架，该框架利用数据分布固有的随机性，使得无需改变原始数据集即可进行隐私评估。我们的方法将隐私审计的范围从传统的成员推理扩展到受保护属性（以标签作为特例），从而弥补了现有技术的一个关键空白。我们为此方法提供了理论基础，并在Criteo和CIFAR-10数据集上进行了实验，结果证明了其在审计标签隐私保障方面的有效性。这项工作为大规模生产环境中的实用隐私审计开辟了新途径。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

差分隐私全指南：从理论基础到用户期望

专知会员服务

13+阅读 · 2025年9月8日

【普林斯顿博士论文】在差分隐私机器学习中有效地从数据中学习并生成数据，189页pdf

专知会员服务

20+阅读 · 2024年10月18日

【普林斯顿博士论文】在差分隐私机器学习中有效地从数据中学习和生成数据

专知会员服务

16+阅读 · 2024年10月7日

【斯坦福博士论文】有效的差分隐私深度学习，153页pdf

专知会员服务

19+阅读 · 2024年7月10日