We introduce a differentially private (DP) algorithm called reveal-or-obscure (ROO) to generate a single representative sample from a dataset of $n$ observations drawn i.i.d. from an unknown discrete distribution $P$. Unlike methods that add explicit noise to the estimated empirical distribution, ROO achieves $ε$-differential privacy by randomly choosing whether to "reveal" or "obscure" the empirical distribution. While ROO is structurally identical to Algorithm 1 proposed by Cheu and Nayak (arXiv:2412.10512), we prove a strictly better bound on the sampling complexity than that established in Theorem 12 of (arXiv:2412.10512). To further improve the privacy-utility trade-off, we propose a novel generalized sampling algorithm called Data-Specific ROO (DS-ROO), where the probability of obscuring the empirical distribution of the dataset is chosen adaptively. We prove that DS-ROO satisfies $ε$-DP, and provide empirical evidence that DS-ROO can achieve better utility under the same privacy budget of vanilla ROO.
翻译:本文提出一种名为“揭示或隐藏”(ROO)的差分隐私(DP)算法,用于从包含$n$个独立同分布观测值的数据集中生成单一代表性样本,这些观测值来自未知离散分布$P$。与向经验分布估计添加显式噪声的方法不同,ROO通过随机选择“揭示”或“隐藏”经验分布来实现$ε$-差分隐私。尽管ROO在结构上与Cheu和Nayak(arXiv:2412.10512)提出的算法1完全相同,但我们证明了其采样复杂度的严格更优上界,优于(arXiv:2412.10512)中定理12所建立的结果。为进一步提升隐私-效用权衡,我们提出一种新颖的广义采样算法——数据特异性ROO(DS-ROO),该算法自适应地选择隐藏数据集经验分布的概率。我们证明DS-ROO满足$ε$-DP,并提供实证证据表明DS-ROO在相同隐私预算下能比原始ROO获得更优的效用。