When creating public data products out of confidential datasets, inferential/posterior-based privacy definitions, such as Pufferfish, provide compelling privacy semantics for data with correlations. However, such privacy definitions are rarely used in practice because they do not always compose. For example, it is possible to design algorithms for these privacy definitions that have no leakage when run once but reveal the entire dataset when run more than once. We prove necessary and sufficient conditions that must be added to ensure linear composition for Pufferfish mechanisms, hence avoiding such privacy collapse. These extra conditions turn out to be differential privacy-style inequalities, indicating that achieving both the interpretable semantics of Pufferfish for correlated data and composition benefits requires adopting differentially private mechanisms to Pufferfish. We show that such translation is possible through a concept called the $(a,b)$-influence curve, and many existing differentially private algorithms can be translated with our framework into a composable Pufferfish algorithm. We illustrate the benefit of our new framework by designing composable Pufferfish algorithms for Markov chains that significantly outperform prior work.
翻译:在将机密数据集转化为公共数据产品时,基于推断/后验的隐私定义(如Pufferfish)为具有相关性的数据提供了引人注目的隐私语义。然而,此类隐私定义在实践中很少使用,因为它们并不总是满足组合性。例如,可以设计出符合此类隐私定义的算法,其在单次运行时无信息泄露,但在多次运行时可能完全暴露整个数据集。我们证明了为Pufferfish机制确保线性组合性所需添加的充分必要条件,从而避免此类隐私崩溃。这些额外条件最终表现为差分隐私风格的不等式,表明要同时实现Pufferfish对相关数据的可解释语义与组合优势,必须采用适用于Pufferfish的差分隐私机制。我们通过称为$(a,b)$-影响曲线的概念证明此类转换是可行的,且许多现有的差分隐私算法可通过我们的框架转换为可组合的Pufferfish算法。我们通过为马尔可夫链设计可组合的Pufferfish算法来展示新框架的优势,该算法显著优于现有方法。