Recent years have witnessed growing concerns about the privacy of sensitive data. In response to these concerns, differential privacy has emerged as a rigorous framework for privacy protection, gaining widespread recognition in both academic and industrial circles. While substantial progress has been made in private data analysis, existing methods often suffer from impracticality or a significant loss of statistical efficiency. This paper aims to alleviate these concerns in the context of hypothesis testing by introducing differentially private permutation tests. The proposed framework extends classical non-private permutation tests to private settings, maintaining both finite-sample validity and differential privacy in a rigorous manner. The power of the proposed test depends on the choice of a test statistic, and we establish general conditions for consistency and non-asymptotic uniform power. To demonstrate the utility and practicality of our framework, we focus on reproducing kernel-based test statistics and introduce differentially private kernel tests for two-sample and independence testing: dpMMD and dpHSIC. The proposed kernel tests are straightforward to implement, applicable to various types of data, and attain minimax optimal power across different privacy regimes. Our empirical evaluations further highlight their competitive power under various synthetic and real-world scenarios, emphasizing their practical value. The code is publicly available to facilitate the implementation of our framework.
翻译:近年来,敏感数据的隐私问题日益引发关注。针对这些担忧,差分隐私作为一种严格的隐私保护框架应运而生,在学术界和工业界均获得广泛认可。尽管隐私数据分析已取得重要进展,但现有方法往往存在实用性不足或统计效率显著下降的问题。本文旨在通过引入差分隐私置换检验来缓解假设检验领域的这些担忧。所提出的框架将经典的非隐私置换检验扩展到隐私设置中,以严格方式同时保证有限样本有效性和差分隐私。该检验的功效取决于检验统计量的选择,我们建立了检验一致性和非渐近均匀功效的一般性条件。为展示框架的实用性与可行性,我们聚焦于再生核核函数检验统计量,并针对双样本检验和独立性检验分别提出两种差分隐私核检验方法:dpMMD和dpHSIC。所提出的核检验方法实现简单、适用于多种数据类型,并在不同隐私预算下均能达到极小极大最优功效。实证评估进一步展示了其在多种合成与真实场景中的竞争性功效,突显其实用价值。为促进框架实施,相关代码已公开提供。