Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

Most differentially private (DP) algorithms assume a central model in which a reliable third party inserts noise to queries made on datasets, or a local model where the users locally perturb their data. However, the central model is vulnerable via a single point of failure, and in the local model, the utility of the data deteriorates significantly. The recently proposed shuffle model is an intermediate framework between the central and the local paradigms where the users send their locally privatized data to a server where messages are shuffled, effacing the link between a privatized message and the corresponding user, giving a better trade-off between privacy and utility than the local model, as its privacy gets amplified without adding more noise. In this paper, we theoretically derive the strictest known bound for DP guarantee for the shuffle models with $k$-Randomized Response local randomizers. There on, we focus on the utility of the shuffle model for histogram queries. Leveraging on the matrix inversion method, which is used to approximate the original distribution from the empirical one produced by the $k$-RR mechanism, we de-noise the histogram produced by the shuffle model to evaluate the total variation distance of the resulting histogram from the true one, which we regard as the measure of utility of the privacy mechanism. We perform experiments on both synthetic and real data to compare the privacy-utility trade-off of the shuffle model with that of the central one privatized by adding the state-of-the-art Gaussian noise to each bin. Although the experimental results stay consistent with the literature that favour the central model, we see that, the difference in statistical utilities between the central and the shuffle models is very small, showing that they are almost comparable under the same level of DP.

翻译：大多数差分隐私（DP）算法假设一个中心模型，其中可信第三方对数据集上的查询注入噪声，或者一个本地模型，用户本地扰动其数据。然而，中心模型容易因单点故障而失效，而本地模型中的数据效用会显著下降。最近提出的洗牌模型是介于中心范式与本地范式之间的中间框架：用户将其本地私有化数据发送至服务器，其中消息经过洗牌处理，消除了私有化消息与相应用户之间的关联，从而在隐私与效用之间提供比本地模型更优的权衡——其隐私性无需添加更多噪声即可得到放大。本文从理论上推导了基于$k$-随机化响应本地随机化器的洗牌模型在差分隐私保障方面已知最严格的界限。在此基础上，我们重点研究洗牌模型在直方图查询中的效用。利用矩阵求逆方法（该方法用于从$k$-RR机制产生的经验分布中近似原始分布），我们对洗牌模型产生的直方图进行去噪，以评估所得直方图与真实直方图之间的全变差距离，并将其视为隐私机制的效用度量。我们在合成数据和真实数据上进行实验，将洗牌模型的隐私-效用权衡与中心模型（通过向每个分箱添加最先进的高斯噪声实现私有化）进行对比。尽管实验结果与支持中心模型的文献一致，但我们发现中心模型与洗牌模型在统计效用上的差异非常微小，表明在相同DP水平下两者几乎具有可比性。