Local differential privacy (LDP) offers rigorous, quantifiable privacy guarantees for personal data by introducing perturbations at the data source. Understanding how these perturbations affect classifier utility is crucial for both designers and users. However, a general theoretical framework for quantifying this impact is lacking and also challenging, especially for complex or black-box classifiers. This paper presents a unified framework for theoretically quantifying classifier utility under LDP mechanisms. The key insight is that LDP perturbations are concentrated around the original data with a specific probability, allowing utility analysis to be reframed as robustness analysis within this concentrated region. Our framework thus connects the concentration properties of LDP mechanisms with the robustness of classifiers, treating LDP mechanisms as general distributional functions and classifiers as black boxes. This generality enables applicability to any LDP mechanism and classifier. A direct application of our utility quantification is guiding the selection of LDP mechanisms and privacy parameters for a given classifier. Notably, our analysis shows that a piecewise-based mechanism often yields better utility than alternatives in common scenarios. Beyond the core framework, we introduce two novel refinement techniques that further improve utility quantification. We then present case studies illustrating utility quantification for various combinations of LDP mechanisms and classifiers. Results demonstrate that our theoretical quantification closely matches empirical observations, particularly when classifiers operate in lower-dimensional input spaces.
翻译:本地差分隐私通过在数据源引入扰动,为个人数据提供了严格、可量化的隐私保证。理解这些扰动如何影响分类器效用对设计者和用户都至关重要。然而,目前缺乏一个量化这种影响的通用理论框架,且构建这样的框架具有挑战性,特别是对于复杂或黑盒分类器。本文提出了一个在LDP机制下理论量化分类器效用的统一框架。核心洞见在于:LDP扰动以特定概率集中在原始数据周围,这使得效用分析可以重新表述为该集中区域内的鲁棒性分析。因此,我们的框架将LDP机制的集中特性与分类器的鲁棒性联系起来,将LDP机制视为广义分布函数,而将分类器视为黑盒。这种通用性使其适用于任何LDP机制和分类器。我们效用量化的一个直接应用是指导针对给定分类器的LDP机制和隐私参数选择。值得注意的是,我们的分析表明,在常见场景中,基于分段的方法通常比其他方案产生更好的效用。除了核心框架,我们还引入了两种新颖的改进技术,以进一步提升效用量化精度。随后,我们通过案例研究展示了针对不同LDP机制与分类器组合的效用量化。结果表明,我们的理论量化与实证观察高度吻合,尤其当分类器在低维输入空间中运行时。