When creating public data products out of confidential datasets, inferential/posterior-based privacy definitions, such as Pufferfish, provide compelling privacy semantics for data with correlations. However, such privacy definitions are rarely used in practice because they do not always compose. For example, it is possible to design algorithms for these privacy definitions that have no leakage when run once but reveal the entire dataset when run more than once. We prove necessary and sufficient conditions that must be added to ensure linear composition for Pufferfish mechanisms, hence avoiding such privacy collapse. These extra conditions turn out to be differential privacy-style inequalities, indicating that achieving both the interpretable semantics of Pufferfish for correlated data and composition benefits requires adopting differentially private mechanisms to Pufferfish. We show that such translation is possible through a concept called the $a(b)$-influence curve, and many existing differentially private algorithms can be translated with our framework into a composable Pufferfish algorithm. We illustrate the benefit of our new framework by designing composable Pufferfish algorithms for Markov chains that significantly outperform prior work.
翻译:在从机密数据集创建公共数据产品时,基于推理/后验的隐私定义(如河豚隐私)为存在相关性的数据提供了具有说服力的隐私语义。然而,此类隐私定义在实践中很少被采用,因为它们并不总是具备组合性。例如,针对这些隐私定义设计的算法可能在单次运行时无信息泄露,但多次运行后却会暴露整个数据集。我们证明了为确保河豚隐私机制的线性组合性而必须添加的充分必要条件,从而避免此类隐私崩溃。这些附加条件最终表现为差分隐私风格的不等式,表明要同时实现河豚隐私在相关性数据上的可解释语义与组合优势,需将差分隐私机制适配至河豚隐私框架。我们通过名为$a(b)$-影响曲线的概念展示了这种转化的可行性,并证明许多现有差分隐私算法可通过我们的框架转换为可组合的河豚隐私算法。最后,我们通过设计针对马尔可夫链的可组合河豚隐私算法来展示该新框架的优势,其性能显著优于先前工作。