Quantifying Classifier Utility under Local Differential Privacy

Local differential privacy (LDP) offers rigorous, quantifiable privacy guarantees for personal data by introducing perturbations at the data source. Understanding how these perturbations affect classifier utility is crucial for both designers and users. However, a general theoretical framework for quantifying this impact is lacking and also challenging, especially for complex or black-box classifiers. This paper presents a unified framework for theoretically quantifying classifier utility under LDP mechanisms. The key insight is that LDP perturbations are concentrated around the original data with a specific probability, allowing utility analysis to be reframed as robustness analysis within this concentrated region. Our framework thus connects the concentration properties of LDP mechanisms with the robustness of classifiers, treating LDP mechanisms as general distributional functions and classifiers as black boxes. This generality enables applicability to any LDP mechanism and classifier. A direct application of our utility quantification is guiding the selection of LDP mechanisms and privacy parameters for a given classifier. Notably, our analysis shows that a piecewise-based mechanism often yields better utility than alternatives in common scenarios. Beyond the core framework, we introduce two novel refinement techniques that further improve utility quantification. We then present case studies illustrating utility quantification for various combinations of LDP mechanisms and classifiers. Results demonstrate that our theoretical quantification closely matches empirical observations, particularly when classifiers operate in lower-dimensional input spaces.

翻译：本地差分隐私通过在数据源引入扰动，为个人数据提供了严格、可量化的隐私保证。理解这些扰动如何影响分类器效用对设计者和用户都至关重要。然而，目前缺乏一个量化这种影响的通用理论框架，且构建这样的框架具有挑战性，特别是对于复杂或黑盒分类器。本文提出了一个在LDP机制下理论量化分类器效用的统一框架。核心洞见在于：LDP扰动以特定概率集中在原始数据周围，这使得效用分析可以重新表述为该集中区域内的鲁棒性分析。因此，我们的框架将LDP机制的集中特性与分类器的鲁棒性联系起来，将LDP机制视为广义分布函数，而将分类器视为黑盒。这种通用性使其适用于任何LDP机制和分类器。我们效用量化的一个直接应用是指导针对给定分类器的LDP机制和隐私参数选择。值得注意的是，我们的分析表明，在常见场景中，基于分段的方法通常比其他方案产生更好的效用。除了核心框架，我们还引入了两种新颖的改进技术，以进一步提升效用量化精度。随后，我们通过案例研究展示了针对不同LDP机制与分类器组合的效用量化。结果表明，我们的理论量化与实证观察高度吻合，尤其当分类器在低维输入空间中运行时。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

差分隐私全指南：从理论基础到用户期望

专知会员服务

13+阅读 · 2025年9月8日

【普林斯顿博士论文】在差分隐私机器学习中有效地从数据中学习和生成数据

专知会员服务

16+阅读 · 2024年10月7日

【斯坦福博士论文】隐私数据实用分析，200页pdf

专知会员服务

24+阅读 · 2024年7月14日

【斯坦福博士论文】有效的差分隐私深度学习，153页pdf

专知会员服务

19+阅读 · 2024年7月10日