Private closeness testing asks to decide whether the underlying probability distributions of two sensitive datasets are identical or differ significantly in statistical distance, while guaranteeing (differential) privacy of the data. As in most (if not all) distribution testing questions studied under privacy constraints, however, previous work assumes that the two datasets are equally sensitive, i.e., must be provided the same privacy guarantees. This is often an unrealistic assumption, as different sources of data come with different privacy requirements; as a result, known closeness testing algorithms might be unnecessarily conservative, "paying" too high a privacy budget for half of the data. In this work, we initiate the study of the closeness testing problem under heterogeneous privacy constraints, where the two datasets come with distinct privacy requirements. We formalize the question and provide algorithms under the three most widely used differential privacy settings, with a particular focus on the local and shuffle models of privacy; and show that one can indeed achieve better sample efficiency when taking into account the two different "epsilon" requirements.
翻译:私有接近性测试旨在判断两个敏感数据集的潜在概率分布在统计距离上是否相同或显著不同,同时保证数据的(差分)隐私性。然而,与在隐私约束下研究的(几乎)所有分布测试问题类似,以往的工作假设这两个数据集具有相同的敏感度,即必须提供相同的隐私保证。这一假设往往不切实际,因为不同数据源具有不同的隐私需求;因此,已知的接近性测试算法可能过于保守,为一半数据“支付”了过高的隐私预算。在这项工作中,我们首次研究了异构隐私约束下的接近性测试问题,即两个数据集具有不同的隐私需求。我们形式化了该问题,并在三种最广泛使用的差分隐私设置下(特别关注本地模型和洗牌模型)提供了算法,并表明当考虑到两个不同的“epsilon”需求时,确实可以实现更好的样本效率。