We introduce a differentially private (DP) algorithm called Reveal-or-Obscure (ROO) to generate a single representative sample from a dataset of n i.i.d. observations from an unknown distribution. Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves $ε$-differential privacy by choosing whether to "reveal" or "obscure" the empirical distribution with a fixed probability $q$. While our proposed mechanism is structurally identical to an algorithm proposed by Cheu and Nayak, we prove a strictly better bound on the sampling complexity than that established in their theorem. Building on this framework, we propose a novel generalized sampler called Data-Specific ROO (DS-ROO), where the obscuring probability $q$ is a function of the empirical distribution. We show that when the dataset contains enough samples from every element of the alphabet, DS-ROO can achieve $ε$-DP while obscuring much less. In addition, we provide tight upper bounds on the utility of DS-ROO in terms of total variation distance. Our results show that under the same privacy budget, DS-ROO can achieve better utility than state-of-the-art private samplers and vanilla ROO, with total variation distance decaying exponentially in dataset size $n$.
翻译:本文提出一种名为"揭示或隐藏"(ROO)的差分隐私(DP)算法,用于从包含n个独立同分布观测值的未知分布数据集中生成单个代表性样本。与在估计经验分布上添加显式噪声的方法不同,ROO通过以固定概率q选择"揭示"或"隐藏"经验分布来实现$ε$-差分隐私。虽然我们提出的机制在结构上与Cheu和Nayak提出的算法完全相同,但我们证明了比其定理所建立界限更严格的采样复杂度上界。基于此框架,我们提出了一种名为数据特定ROO(DS-ROO)的新型广义采样器,其中隐藏概率q是经验分布的函数。我们证明当数据集中包含字母表每个元素的足够样本时,DS-ROO能够在隐藏更少信息的情况下实现$ε$-DP。此外,我们以总变差距离为度量,给出了DS-ROO效用的紧致上界。研究结果表明,在相同隐私预算下,DS-ROO能够比现有最优隐私采样器和原始ROO获得更好的效用,其总变差距离随数据集大小n呈指数衰减。