Differentially Private Multivariate Statistics with an Application to Contingency Table Analysis

Differential privacy (DP) has become a rigorous central concept for privacy protection in the past decade. We use Gaussian differential privacy (GDP) in gauging the level of privacy protection for releasing statistical summaries from data. The GDP is a natural and easy-to-interpret differential privacy criterion based on the statistical hypothesis testing framework. The Gaussian mechanism is a natural and fundamental mechanism that can be used to perturb multivariate statistics to satisfy a $\mu$-GDP criterion, where $\mu>0$ stands for the level of privacy protection. Requiring a certain level of differential privacy inevitably leads to a loss of statistical utility. We improve ordinary Gaussian mechanisms by developing rank-deficient James-Stein Gaussian mechanisms for releasing private multivariate statistics, and show that the proposed mechanisms have higher statistical utilities. Laplace mechanisms, the most commonly used mechanisms in the pure DP framework, are also investigated under the GDP criterion. We show that optimal calibration of multivariate Laplace mechanisms requires more information on the statistic than just the global sensitivity, and derive the minimal amount of Laplace perturbation for releasing $\mu$-GDP contingency tables. Gaussian mechanisms are shown to have higher statistical utilities than Laplace mechanisms, except for very low levels of privacy. The utility of proposed multivariate mechanisms is further demonstrated using differentially private hypotheses tests on contingency tables. Bootstrap-based goodness-of-fit and homogeneity tests, utilizing the proposed rank-deficient James--Stein mechanisms, exhibit higher powers than natural competitors.

翻译：差分隐私（DP）已成为过去十年中隐私保护的严谨核心概念。我们采用高斯差分隐私（GDP）来衡量从数据中发布统计摘要时的隐私保护水平。GDP是一种基于统计假设检验框架的自然且易于解释的差分隐私准则。高斯机制是一种自然且基础的机制，可用于扰动多元统计量以满足$\mu$-GDP准则，其中$\mu>0$代表隐私保护水平。要求一定水平的差分隐私不可避免地导致统计效用的损失。我们通过开发秩亏詹姆斯-斯坦高斯机制来改进普通高斯机制，用于发布私有多元统计量，并证明所提出的机制具有更高的统计效用。拉普拉斯机制（纯DP框架中最常用的机制）也在GDP准则下进行了研究。我们表明，多元拉普拉斯机制的最优校准需要比全局敏感度更多的统计量信息，并推导了发布$\mu$-GDP列联表所需的最小拉普拉斯扰动。研究表明，除极低隐私水平外，高斯机制比拉普拉斯机制具有更高的统计效用。通过列联表上的差分隐私假设检验进一步展示了所提出的多元机制的效用。利用所提出的秩亏詹姆斯-斯坦机制的基于Bootstrap的拟合优度检验和齐性检验，表现出比自然竞争方法更高的功效。