We extend the framework of augmented distribution testing (Aliakbarpour, Indyk, Rubinfeld, and Silwal, NeurIPS 2024) to the differentially private setting. This captures scenarios where a data analyst must perform hypothesis testing tasks on sensitive data, but is able to leverage prior knowledge (public, but possibly erroneous or untrusted) about the data distribution. We design private algorithms in this augmented setting for three flagship distribution testing tasks, uniformity, identity, and closeness testing, whose sample complexity smoothly scales with the claimed quality of the auxiliary information. We complement our algorithms with information-theoretic lower bounds, showing that their sample complexity is optimal (up to logarithmic factors).
翻译:我们将增强分布测试框架(Aliakbarpour, Indyk, Rubinfeld, and Silwal, NeurIPS 2024)扩展至差分隐私设置。该框架描述了数据分析师必须对敏感数据执行假设检验任务,但能够利用关于数据分布的先前知识(公开但可能错误或不可信)的场景。在此增强设置下,我们针对三个核心分布测试任务——均匀性测试、同一性测试和接近性测试——设计了私有算法,其样本复杂度随辅助信息声称的质量平滑变化。我们通过信息论下界补充了算法分析,表明其样本复杂度(在对数因子内)是最优的。