Sufficient statistic perturbation (SSP) is a widely used method for differentially private linear regression. SSP adopts a data-independent approach where privacy noise from a simple distribution is added to sufficient statistics. However, sufficient statistics can often be expressed as linear queries and better approximated by data-dependent mechanisms. In this paper we introduce data-dependent SSP for linear regression based on post-processing privately released marginals, and find that it outperforms state-of-the-art data-independent SSP. We extend this result to logistic regression by developing an approximate objective that can be expressed in terms of sufficient statistics, resulting in a novel and highly competitive SSP approach for logistic regression. We also make a connection to synthetic data for machine learning: for models with sufficient statistics, training on synthetic data corresponds to data-dependent SSP, with the overall utility determined by how well the mechanism answers these linear queries.
翻译:充分统计量扰动(SSP)是差分隐私线性回归中广泛使用的方法。传统SSP采用数据独立的方式,从简单分布中抽取隐私噪声添加到充分统计量中。然而,充分统计量通常可表示为线性查询,通过数据依赖机制能获得更好的近似效果。本文提出基于后处理私有释放边缘分布的数据依赖SSP方法用于线性回归,实验证明其性能优于最先进的数据独立SSP。我们将该结果扩展至逻辑回归,通过构建可用充分统计量表达的近似目标函数,提出了一种新颖且极具竞争力的逻辑回归SSP方法。本文还建立了与机器学习合成数据的联系:对于具有充分统计量的模型,在合成数据上训练等价于数据依赖SSP,其整体效用取决于机制回答这些线性查询的准确度。