Differential Privacy (DP) considers a scenario in which an adversary has almost complete information about the entries of a database. This worst-case assumption is likely to overestimate the privacy threat faced by an individual in practice. In contrast, Statistical Privacy (SP), as well as related notions such as noiseless privacy or limited background knowledge privacy, describe a setting in which the adversary knows the distribution of the database entries, but not their exact realizations. In this case, privacy analysis must account for the interaction between uncertainty induced by the entropy of the underlying distributions and privacy mechanisms that distort query answers, which can be highly non-trivial. This paper investigates this problem for multiple queries (composition). A privacy mechanism is proposed that is based on subsampling and randomly partitioning the database to bound the dependency among queries. This way for the first time, to the best of our knowledge, upper privacy bounds against limited adversaries are obtained without any further restriction on the database. These bounds show that in realistic application scenarios taking the entropy of distributions into account yields improvements of privacy and precision guarantees. We illustrate examples where for fixed privacy parameters and utility loss SP allows significantly more queries than DP.
翻译:差分隐私(DP)考虑的是攻击者几乎完全掌握数据库条目信息的场景。这种最坏情况假设在实践中可能高估了个体面临的隐私威胁。相比之下,统计隐私(SP)及相关概念(如无噪声隐私或有限背景知识隐私)描述的是攻击者仅知道数据库条目的概率分布而不知其具体取值的场景。在这种情况下,隐私分析必须考虑底层分布的熵所引入的不确定性与扭曲查询响应的隐私机制之间的相互作用,这一分析可能极为复杂。本文针对多重查询(组合)场景研究了该问题。我们提出一种基于子采样与数据库随机划分的隐私机制,以限制查询间的依赖性。据我们所知,这是首次在无需对数据库施加额外限制的条件下,获得了针对有限能力攻击者的隐私上界。这些界限表明,在实际应用场景中考虑分布熵能够同时提升隐私保护强度与数据可用性保证。我们通过实例说明:在固定隐私参数与效用损失的前提下,SP相较于DP可支持显著更多的查询次数。