The shuffle model of Differential Privacy (DP) is an enhanced privacy protocol which introduces an intermediate trusted server between local users and a central data curator. It significantly amplifies the central DP guarantee by anonymizing and shuffling the local randomized data. Yet, deriving a tight privacy bound is challenging due to its complicated randomization protocol. While most existing work are focused on unified local privacy settings, this work focuses on deriving the central privacy bound for a more practical setting where personalized local privacy is required by each user. To bound the privacy after shuffling, we first need to capture the probability of each user generating clones of the neighboring data points. Second, we need to quantify the indistinguishability between two distributions of the number of clones on neighboring datasets. Existing works either inaccurately capture the probability, or underestimate the indistinguishability between neighboring datasets. Motivated by this, we develop a more precise analysis, which yields a general and tighter bound for arbitrary DP mechanisms. Firstly, we derive the clone-generating probability by hypothesis testing %from a randomizer-specific perspective, which leads to a more accurate characterization of the probability. Secondly, we analyze the indistinguishability in the context of $f$-DP, where the convexity of the distributions is leveraged to achieve a tighter privacy bound. Theoretical and numerical results demonstrate that our bound remarkably outperforms the existing results in the literature.
翻译:差分隐私(DP)的混洗模型是一种增强型隐私协议,它在本地用户与中央数据管理者之间引入了一个可信的中间服务器。该模型通过对本地随机化数据进行匿名化与混洗,显著放大了中央差分隐私的保障强度。然而,由于其复杂的随机化协议,推导出紧致的隐私界具有挑战性。现有研究大多集中于统一的本地隐私设置,而本文则致力于为更具实际意义的场景——即每位用户要求个性化本地隐私——推导中央隐私界。为界定混洗后的隐私水平,我们首先需要刻画每位用户生成相邻数据点副本的概率;其次,需要量化相邻数据集上副本数量分布之间的不可区分性。现有研究要么未能准确捕捉该概率,要么低估了相邻数据集间的不可区分性。受此启发,我们发展了一种更精确的分析方法,为任意差分隐私机制推导出更通用且更紧致的隐私界。首先,我们从随机化器特定视角出发,通过假设检验推导副本生成概率,从而实现对概率的更精确刻画。其次,我们在$f$-DP框架下分析不可区分性,利用分布的凸性以获得更紧致的隐私界。理论与数值结果均表明,我们推导的隐私界显著优于现有文献中的结果。