We develop a theory of asymptotic efficiency in regular parametric models when data confidentiality is ensured by local differential privacy (LDP). Even though efficient parameter estimation is a classical and well-studied problem in mathematical statistics, it leads to several non-trivial obstacles that need to be tackled when dealing with the LDP case. Starting from a standard parametric model $\mathcal P=(P_\theta)_{\theta\in\Theta}$, $\Theta\subseteq\mathbb R^p$, for the iid unobserved sensitive data $X_1,\dots, X_n$, we establish local asymptotic mixed normality (along subsequences) of the model $$Q^{(n)}\mathcal P=(Q^{(n)}P_\theta^n)_{\theta\in\Theta}$$ generating the sanitized observations $Z_1,\dots, Z_n$, where $Q^{(n)}$ is an arbitrary sequence of sequentially interactive privacy mechanisms. This result readily implies convolution and local asymptotic minimax theorems. In case $p=1$, the optimal asymptotic variance is found to be the inverse of the supremal Fisher-Information $\sup_{Q\in\mathcal Q_\alpha} I_\theta(Q\mathcal P)\in\mathbb R$, where the supremum runs over all $\alpha$-differentially private (marginal) Markov kernels. We present an algorithm for finding a (nearly) optimal privacy mechanism $\hat{Q}$ and an estimator $\hat{\theta}_n(Z_1,\dots, Z_n)$ based on the corresponding sanitized data that achieves this asymptotically optimal variance.
翻译:我们发展了一种在正则参数模型中关于渐近效率的理论,此时数据机密性通过本地差分隐私(LDP)得到保障。尽管有效参数估计是数理统计学中的一个经典且被充分研究的问题,但在处理LDP情形时,它引出了若干需要应对的非平凡障碍。从独立同分布未观测敏感数据 $X_1,\dots, X_n$ 的标准参数模型 $\mathcal P=(P_\theta)_{\theta\in\Theta}$、$\Theta\subseteq\mathbb R^p$ 出发,我们建立了生成经过净化观测值 $Z_1,\dots, Z_n$ 的模型 $$Q^{(n)}\mathcal P=(Q^{(n)}P_\theta^n)_{\theta\in\Theta}$$ 的(沿子序列的)局部渐近混合正态性,其中 $Q^{(n)}$ 是任意序列的序列交互式隐私机制。该结果直接蕴含了卷积定理和局部渐近极小极大定理。在 $p=1$ 的情形下,最优渐近方差被发现为超级Fisher信息量 $\sup_{Q\in\mathcal Q_\alpha} I_\theta(Q\mathcal P)\in\mathbb R$ 的倒数,其中上确界遍历所有 $\alpha$-差分隐私(边际)马尔可夫核。我们提出了一种算法,用于寻找一个(近似)最优的隐私机制 $\hat{Q}$,并基于相应净化数据构建一个估计量 $\hat{\theta}_n(Z_1,\dots, Z_n)$,该估计量能够达到该渐近最优方差。