Many differentially private (DP) data release systems either output DP synthetic data and leave analysts to perform inference as usual, which can lead to severe miscalibration, or output a DP point estimate without a principled way to do uncertainty quantification. This paper develops a clean and tractable middle ground for exponential families: release only DP sufficient statistics, then perform noise-calibrated likelihood-based inference and optional parametric synthetic data generation as post-processing. Our contributions are: (1) a general recipe for approximate-DP release of clipped sufficient statistics under the Gaussian mechanism; (2) asymptotic normality, explicit variance inflation, and valid Wald-style confidence intervals for the plug-in DP MLE; (3) a noise-aware likelihood correction that is first-order equivalent to the plug-in but supports bootstrap-based intervals; and (4) a matching minimax lower bound showing the privacy distortion rate is unavoidable. The resulting theory yields concrete design rules and a practical pipeline for releasing DP synthetic data with principled uncertainty quantification, validated on three exponential families and real census data.
翻译:许多差分隐私(DP)数据发布系统要么输出DP合成数据并让分析人员照常进行推断,这可能导致严重的校准偏差;要么输出DP点估计,但缺乏进行不确定性量化的原则性方法。本文为指数族模型提出了一种简洁且易于处理的中间方案:仅发布DP充分统计量,然后执行噪声校准的基于似然的推断,并可选择性地进行参数化合成数据生成作为后处理。我们的贡献包括:(1)在高斯机制下发布截断充分统计量的一般性近似DP方案;(2)针对插件式DP最大似然估计量的渐近正态性、显式方差膨胀及有效的Wald型置信区间;(3)一种与插件式方法一阶等价但支持基于自助法的置信区间的噪声感知似然校正方法;(4)匹配的极小极大下界,证明隐私失真率是不可避免的。所得理论为发布具有原则性不确定性量化的DP合成数据提供了具体的设计规则和实用流程,并在三个指数族模型和真实人口普查数据上得到验证。