This paper considers the $\varepsilon$-differentially private (DP) release of an approximate cumulative distribution function (CDF) of the samples in a dataset. We assume that the true (approximate) CDF is obtained after lumping the data samples into a fixed number $K$ of bins. In this work, we extend the well-known binary tree mechanism to the class of \emph{level-uniform tree-based} mechanisms and identify $\varepsilon$-DP mechanisms that have a small $\ell_2$-error. We identify optimal or close-to-optimal tree structures when either of the parameters, which are the branching factors or the privacy budgets at each tree level, are given, and when the algorithm designer is free to choose both sets of parameters. Interestingly, when we allow the branching factors to take on real values, under certain mild restrictions, the optimal level-uniform tree-based mechanism is obtained by choosing equal branching factors \emph{independent} of $K$, and equal privacy budgets at all levels. Furthermore, for selected $K$ values, we explicitly identify the optimal \emph{integer} branching factors and tree height, assuming equal privacy budgets at all levels. Finally, we describe general strategies for improving the private CDF estimates further, by combining multiple noisy estimates and by post-processing the estimates for consistency.
翻译:本文研究在$\varepsilon$-差分隐私(DP)约束下发布数据集中样本近似累积分布函数(CDF)的问题。我们假设真实(近似)CDF是通过将数据样本归入固定数量$K$个区间后获得的。本研究将著名的二叉树机制推广至\emph{层均匀树型}机制类别,并识别出具有较小$\ell_2$误差的$\varepsilon$-DP机制。我们分别在以下三种情况下确定了最优或接近最优的树结构:当给定各层分支因子或隐私预算参数时;当算法设计者可自由选择两组参数时。值得注意的是,在允许分支因子取实数值且满足特定温和限制条件下,最优层均匀树型机制可通过选择\emph{独立于}$K$的等值分支因子及各层均匀分配的隐私预算获得。此外,针对选定的$K$值,我们在假设各层隐私预算相等的条件下,显式确定了最优的\emph{整数}分支因子与树高度。最后,我们阐述了通过融合多个噪声估计值以及对估计结果进行一致性后处理来进一步提升私有CDF估计精度的通用策略。