We propose and study a new privacy definition, termed Probably Approximately Correct (PAC) Privacy. PAC Privacy characterizes the information-theoretic hardness to recover sensitive data given arbitrary information disclosure/leakage during/after any processing. Unlike the classic cryptographic definition and Differential Privacy (DP), which consider the adversarial (input-independent) worst case, PAC Privacy is a simulatable metric that quantifies the instance-based impossibility of inference. A fully automatic analysis and proof generation framework is proposed: security parameters can be produced with arbitrarily high confidence via Monte-Carlo simulation for any black-box data processing oracle. This appealing automation property enables analysis of complicated data processing, where the worst-case proof in the classic privacy regime could be loose or even intractable. Moreover, we show that the produced PAC Privacy guarantees enjoy simple composition bounds and the automatic analysis framework can be implemented in an online fashion to analyze the composite PAC Privacy loss even under correlated randomness. On the utility side, the magnitude of (necessary) perturbation required in PAC Privacy is not lower bounded by Theta(\sqrt{d}) for a d-dimensional release but could be O(1) for many practical data processing tasks, which is in contrast to the input-independent worst-case information-theoretic lower bound. Example applications of PAC Privacy are included with comparisons to existing works.
翻译:我们提出并研究了一种新的隐私定义,称为“概率近似正确(PAC)隐私”。PAC隐私刻画了在任意数据处理过程中或处理后,当敏感信息被披露/泄露时,恢复该敏感信息的理论难度。与考虑对抗性(与输入无关)最坏情况的经典密码学定义和差分隐私不同,PAC隐私是一种可模拟的度量标准,用于量化基于实例的推断不可行性。我们提出了一种全自动分析与证明生成框架:通过蒙特卡洛模拟,能够以任意高置信度为任意黑盒数据处理预言机生成安全参数。这种吸引人的自动化特性使得对复杂数据处理的分析成为可能,而在经典隐私体制下,针对最坏情况的证明可能过于宽松甚至难以处理。此外,我们证明了所生成的PAC隐私保证具有简单的组合界,并且该自动分析框架可以以在线方式实现,即使在相关随机性下也能分析组合PAC隐私损失。在效用方面,对于d维数据发布,PAC隐私所需的(必要)扰动幅度并不以Theta(√d)为下界,而在许多实际数据处理任务中可能仅为O(1),这与与输入无关的最坏情况信息论下界形成对比。本文还包含了PAC隐私的示例应用,并与现有工作进行了比较。