Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

Shuffler-based differential privacy (shuffle-DP) is a privacy paradigm providing high utility by involving a shuffler to permute noisy report from users. Existing shuffle-DP protocols mainly focus on the design of shuffler-based categorical frequency oracle (SCFO) for frequency estimation on categorical data. However, numerical data is a more prevalent type and many real-world applications depend on the estimation of data distribution with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt to transplant existing SCFOs and the naïve distribution recovery technique to this task, and demonstrate that these baseline protocols cannot simultaneously achieve outstanding performance in three metrics: 1) utility, 2) message complexity; and 3) robustness to data poisoning attacks. Therefore, we further propose a novel single-message \textit{adaptive shuffler-based piecewise} (ASP) protocol with high utility and robustness. In ASP, we first develop a randomizer by parameter optimization using our proposed tighter bound of mutual information. We also design an \textit{Expectation Maximization with Adaptive Smoothing} (EMAS) algorithm to accurately recover distribution with enhanced robustness. To quantify robustness, we propose a new evaluation framework to examine robustness under different attack targets, enabling us to comprehensively understand the protocol resilience under various adversarial scenarios. Extensive experiments demonstrate that ASP outperforms baseline protocols in all three metrics. Especially under small $ε$ values, ASP achieves an order of magnitude improvement in utility with minimal message complexity, and exhibits over threefold robustness compared to baseline methods.

翻译：基于混洗器的差分隐私（shuffle-DP）是一种通过引入混洗器对用户噪声报告进行重排以提供高实用性的隐私范式。现有的shuffle-DP协议主要集中于设计基于混洗器的分类频率预言机（SCFO）用于分类数据的频率估计。然而，数值数据是一种更普遍的类型，许多现实应用依赖于具有序数性质的数据分布估计。本文中，我们研究纯混洗模型下的分布估计，这是一种无需强安全假设的流行shuffle-DP框架。我们首先尝试将现有的SCFO和朴素分布恢复技术移植到此任务中，并证明这些基线协议无法在三个指标上同时取得优异性能：1）实用性；2）消息复杂度；以及3）对数据投毒攻击的鲁棒性。因此，我们进一步提出了一种新颖的单消息\textit{自适应混洗器分段}（ASP）协议，具有高实用性和鲁棒性。在ASP中，我们首先通过使用我们提出的更紧互信息界进行参数优化来开发随机化器。我们还设计了一种\textit{自适应平滑期望最大化}（EMAS）算法，以精确恢复分布并增强鲁棒性。为了量化鲁棒性，我们提出了一个新的评估框架来检验不同攻击目标下的鲁棒性，使我们能够全面理解协议在各种对抗场景下的弹性。大量实验表明，ASP在所有三个指标上均优于基线协议。特别是在小$ε$值下，ASP以最小的消息复杂度实现了实用性数量级的提升，并且展现出相比基线方法超过三倍的鲁棒性。