This paper considers the $\varepsilon$-differentially private (DP) release of an approximate cumulative distribution function (CDF) of the samples in a dataset. We assume that the true (approximate) CDF is obtained after lumping the data samples into a fixed number $K$ of bins. In this work, we extend the well-known binary tree mechanism to the class of \emph{level-uniform tree-based} mechanisms and identify $\varepsilon$-DP mechanisms that have a small $\ell_2$-error. We identify optimal or close-to-optimal tree structures when either of the parameters, which are the branching factors or the privacy budgets at each tree level, are given, and when the algorithm designer is free to choose both sets of parameters. Interestingly, when we allow the branching factors to take on real values, under certain mild restrictions, the optimal level-uniform tree-based mechanism is obtained by choosing equal branching factors \emph{independent} of $K$, and equal privacy budgets at all levels. Furthermore, for selected $K$ values, we explicitly identify the optimal \emph{integer} branching factors and tree height, assuming equal privacy budgets at all levels. Finally, we describe general strategies for improving the private CDF estimates further, by combining multiple noisy estimates and by post-processing the estimates for consistency.
翻译:本文研究在$\varepsilon$-差分隐私约束下发布数据集中样本近似累积分布函数的问题。我们假设真实(近似)CDF通过将数据样本归入固定数量$K$个区间后获得。本研究将著名的二叉树机制推广至\emph{层均匀树机制}类别,并识别出具有较小$\ell_2$误差的$\varepsilon$-差分隐私机制。我们分别在以下情形中确定了最优或接近最优的树结构:当分支因子或各层隐私预算参数给定时,以及当算法设计者可自由选择两组参数时。有趣的是,在允许分支因子取实数值且满足特定温和约束条件下,最优层均匀树机制可通过选择\emph{与$K$无关}的等值分支因子及各层等值隐私预算获得。此外,针对选定的$K$值,我们在假设各层隐私预算相等的条件下,显式确定了最优\emph{整数}分支因子与树高度。最后,我们阐述了通过融合多个噪声估计值以及对估计结果进行一致性后处理来进一步提升私有CDF估计精度的通用策略。