Analyzing the Shuffle Model through the Lens of Quantitative Information Flow

Local differential privacy (LDP) is a variant of differential privacy (DP) that avoids the need for a trusted central curator, at the cost of a worse trade-off between privacy and utility. The shuffle model is a way to provide greater anonymity to users by randomly permuting their messages, so that the link between users and their reported values is lost to the data collector. By combining an LDP mechanism with a shuffler, privacy can be improved at no cost for the accuracy of operations insensitive to permutations, thereby improving utility in many tasks. However, the privacy implications of shuffling are not always immediately evident, and derivations of privacy bounds are made on a case-by-case basis. In this paper, we analyze the combination of LDP with shuffling in the rigorous framework of quantitative information flow (QIF), and reason about the resulting resilience to inference attacks. QIF naturally captures randomization mechanisms as information-theoretic channels, thus allowing for precise modeling of a variety of inference attacks in a natural way and for measuring the leakage of private information under these attacks. We exploit symmetries of the particular combination of k-RR mechanisms with the shuffle model to achieve closed formulas that express leakage exactly. In particular, we provide formulas that show how shuffling improves protection against leaks in the local model, and study how leakage behaves for various values of the privacy parameter of the LDP mechanism. In contrast to the strong adversary from differential privacy, we focus on an uninformed adversary, who does not know the value of any individual in the dataset. This adversary is often more realistic as a consumer of statistical datasets, and we show that in some situations mechanisms that are equivalent w.r.t. the strong adversary can provide different privacy guarantees under the uninformed one.

翻译：局部差分隐私（LDP）是差分隐私（DP）的一种变体，其无需可信中心数据管理员，但代价是隐私与效用之间的权衡更差。混洗模型通过随机排列用户消息，使数据收集者无法将用户与其报告值关联，从而为用户提供更强的匿名性。通过将LDP机制与混洗器结合，可在不损害对排列不敏感的操作准确性的前提下提升隐私保护，进而提高许多任务中的效用。然而，混洗的隐私影响并非总是显而易见，且隐私界限的推导需逐例进行。本文在定量信息流（QIF）的严格框架下分析LDP与混洗的结合，并探讨其对抗推断攻击的鲁棒性。QIF自然地将随机化机制建模为信息论信道，从而能以自然方式精确刻画多种推断攻击，并度量这些攻击下的隐私泄露量。我们利用k-RR机制与混洗模型特定组合的对称性，推导出精确表达泄露量的封闭公式。具体而言，我们给出公式展示混洗如何增强对局部模型中泄露的保护，并研究隐私泄露随LDP机制隐私参数的变化规律。与差分隐私中的强敌手不同，我们聚焦于一种无信息敌手——该敌手不了解数据集中任何个体的取值。此类敌手作为统计数据集的消费者往往更具现实性，我们证明：在某些情况下，相对于强敌手等效的机制，在无信息敌手下可能提供不同的隐私保障。