Recent years have witnessed growing concerns about the privacy of sensitive data. In response to these concerns, differential privacy has emerged as a rigorous framework for privacy protection, gaining widespread recognition in both academic and industrial circles. While substantial progress has been made in private data analysis, existing methods often suffer from impracticality or a significant loss of statistical efficiency. This paper aims to alleviate these concerns in the context of hypothesis testing by introducing differentially private permutation tests. The proposed framework extends classical non-private permutation tests to private settings, maintaining both finite-sample validity and differential privacy in a rigorous manner. The power of the proposed test depends on the choice of a test statistic, and we establish general conditions for consistency and non-asymptotic uniform power. To demonstrate the utility and practicality of our framework, we focus on reproducing kernel-based test statistics and introduce differentially private kernel tests for two-sample and independence testing: dpMMD and dpHSIC. The proposed kernel tests are straightforward to implement, applicable to various types of data, and attain minimax optimal power across different privacy regimes. Our empirical evaluations further highlight their competitive power under various synthetic and real-world scenarios, emphasizing their practical value. The code is publicly available to facilitate the implementation of our framework.
翻译:近年来,敏感数据的隐私问题日益受到关注。为此,差分隐私作为严格的隐私保护框架应运而生,并在学术界和工业界获得广泛认可。尽管私有数据分析已取得显著进展,但现有方法常存在实用性不足或统计效率大幅降低的问题。本文旨在通过引入差分隐私置换检验,缓解假设检验领域的这些顾虑。所提出的框架将经典的非私有置换检验扩展到私有场景,以严谨方式同时保证有限样本有效性和差分隐私。该检验的统计功效取决于检验统计量的选择,我们建立了检验一致性和非渐近均匀功效的通用条件。为展示框架的实用性与可行性,我们聚焦于再生核基检验统计量,并引入用于双样本检验和独立性检验的差分隐私核检验:dpMMD和dpHSIC。所提出的核检验方法实现简便、适用于多种数据类型,并在不同隐私机制下达到极小极大最优功效。实证评估进一步凸显了其在多种合成与真实场景中的竞争性功效,强调了其实用价值。代码已公开,便于框架的推广应用。