There are many existing differentially private algorithms for releasing histograms, i.e. counts with corresponding labels, in various settings. Our focus in this survey is to revisit some of the existing differentially private algorithms for releasing histograms over unknown domains, i.e. the labels of the counts that are to be released are not known beforehand. The main practical advantage of releasing histograms over an unknown domain is that the algorithm does not need to fill in missing labels because they are not present in the original histogram but in a hypothetical neighboring dataset could appear in the histogram. However, the challenge in designing differentially private algorithms for releasing histograms over an unknown domain is that some outcomes can clearly show which input was used, clearly violating privacy. The goal then is to show that the differentiating outcomes occur with very low probability. We present a unified framework for the privacy analyses of several existing algorithms. Furthermore, our analysis uses approximate concentrated differential privacy from Bun and Steinke'16, which can improve the privacy loss parameters rather than using differential privacy directly, especially when composing many of these algorithms together in an overall system.
翻译:现有的差分隐私算法已广泛应用于不同场景下的直方图发布(即带有对应标签的计数)。本综述重点回顾部分针对未知域直方图发布的现有差分隐私算法,其中待发布计数的标签并非预先已知。未知域直方图发布的主要实践优势在于:算法无需填充缺失标签——这些标签在原直方图中不存在,但可能在假设的相邻数据集中出现。然而,设计面向未知域直方图发布的差分隐私算法面临挑战:某些输出结果可能清晰揭示所使用的输入数据,从而明显违反隐私保护要求。因此,研究目标在于证明这些具有区分性的输出结果以极低概率出现。本文为若干现有算法的隐私分析提出统一框架。进一步地,本分析采用Bun与Steinke于2016年提出的近似集中差分隐私(Approximate Concentrated Differential Privacy)方法,该方法能够优化隐私损失参数——特别是在整体系统中组合多个此类算法时,相较于直接使用差分隐私具有显著优势。