Hypothesis testing is a central problem in statistical analysis, and there is currently a lack of differentially private tests which are both statistically valid and powerful. In this paper, we develop several new differentially private (DP) nonparametric hypothesis tests. Our tests are based on Kolmogorov-Smirnov, Kuiper, Cram\'er-von Mises, and Wasserstein test statistics, which can all be expressed as a pseudo-metric on empirical cumulative distribution functions (ecdfs), and can be used to test hypotheses on goodness-of-fit, two samples, and paired data. We show that these test statistics have low sensitivity, requiring minimal noise to satisfy DP. In particular, we show that the sensitivity of these test statistics can be expressed in terms of the base sensitivity, which is the pseudo-metric distance between the ecdfs of adjacent databases and is easily calculated. The sampling distribution of our test statistics are distribution-free under the null hypothesis, enabling easy computation of $p$-values by Monte Carlo methods. We show that in several settings, especially with small privacy budgets or heavy-tailed data, our new DP tests outperform alternative nonparametric DP tests.
翻译:假设检验是统计分析中的核心问题,目前缺乏兼具统计有效性和检验功效的差分隐私检验方法。本文提出了若干新型差分隐私(DP)非参数假设检验方法。我们的检验基于Kolmogorov-Smirnov、Kuiper、Cramér-von Mises和Wasserstein检验统计量——这些统计量均可表示为经验累积分布函数(ECDF)上的伪度量,可用于拟合优度检验、双样本检验和配对数据检验。研究表明这些检验统计量具有低灵敏度特性,仅需添加少量噪声即可满足差分隐私要求。具体而言,我们证明这些检验统计量的灵敏度可用基础灵敏度表示,后者定义为相邻数据库ECDF之间的伪度量距离,计算简便。在零假设下,我们的检验统计量具有无分布依赖的抽样分布特性,从而可通过蒙特卡洛方法简便计算p值。实验表明,在多种场景(尤其是小隐私预算或重尾数据场景)中,本文提出的新型DP检验方法优于其他非参数DP检验方法。