We propose and study a new privacy definition, termed Probably Approximately Correct (PAC) Security. PAC security characterizes the information-theoretic hardness to recover sensitive data given arbitrary information disclosure/leakage during/after any processing. Unlike the classic cryptographic definition and Differential Privacy (DP), which consider the adversarial (input-independent) worst case, PAC security is a simulatable metric that quantifies the instance-based impossibility of inference. A fully automatic analysis and proof generation framework is proposed: security parameters can be produced with arbitrarily high confidence via Monte-Carlo simulation for any black-box data processing oracle. This appealing automation property enables analysis of complicated data processing, where the worst-case proof in the classic privacy regime could be loose or even intractable. Moreover, we show that the produced PAC security guarantees enjoy simple composition bounds and the automatic analysis framework can be implemented in an online fashion to analyze the composite PAC security loss even under correlated randomness. On the utility side, the magnitude of (necessary) perturbation required in PAC security is not lower bounded by $\Theta(\sqrt{d})$ for a $d$-dimensional release but could be O(1) for many practical data processing tasks, which is in contrast to the input-independent worst-case information-theoretic lower bound. Example applications of PAC security are included with comparisons to existing works.
翻译:我们提出并研究了一种新的隐私定义,称为"可能近似正确(PAC)安全性"。PAC安全性刻画了在任意处理过程中/处理后,当敏感数据存在任意信息披露/泄露时,从信息论角度恢复该数据的难度特征。与经典密码学定义和差分隐私(DP)考虑对抗性(输入无关的)最坏情况不同,PAC安全性是一种可模拟的度量指标,用于量化基于具体实例的推理不可行性。我们提出了一种全自动分析与证明生成框架:通过针对任意黑盒数据处理预言机的蒙特卡洛模拟,可以任意高置信度地生成安全参数。这种诱人的自动化特性使得分析复杂数据处理成为可能,而在经典隐私体制中,这类场景的最坏情况证明可能过于宽松甚至难以处理。此外,我们证明生成的PAC安全性保证具有简单的组合界,且该自动分析框架可通过在线方式实现,即便在相关随机性下也能分析复合PAC安全性损失。在效用方面,对于d维数据发布,PAC安全性所需(必要)扰动的幅度并不受下界为$\Theta(\sqrt{d})$的限制,而针对许多实际数据处理任务可达到O(1),这与输入无关的最坏情况信息论下界形成鲜明对比。本文还给出了PAC安全性的示例应用,并与现有研究工作进行了比较。