We study the problem of estimating non-linear functionals of discrete distributions in the context of local differential privacy. The initial data $x_1,\ldots,x_n \in [K]$ are supposed i.i.d. and distributed according to an unknown discrete distribution $p = (p_1,\ldots,p_K)$. Only $\alpha$-locally differentially private (LDP) samples $z_1,...,z_n$ are publicly available, where the term 'local' means that each $z_i$ is produced using one individual attribute $x_i$. We exhibit privacy mechanisms (PM) that are interactive (i.e. they are allowed to use already published confidential data) or non-interactive. We describe the behavior of the quadratic risk for estimating the power sum functional $F_{\gamma} = \sum_{k=1}^K p_k^{\gamma}$, $\gamma >0$ as a function of $K, \, n$ and $\alpha$. In the non-interactive case, we study two plug-in type estimators of $F_{\gamma}$, for all $\gamma >0$, that are similar to the MLE analyzed by Jiao et al. (2017) in the multinomial model. However, due to the privacy constraint the rates we attain are slower and similar to those obtained in the Gaussian model by Collier et al. (2020). In the interactive case, we introduce for all $\gamma >1$ a two-step procedure which attains the faster parametric rate $(n \alpha^2)^{-1/2}$ when $\gamma \geq 2$. We give lower bounds results over all $\alpha$-LDP mechanisms and all estimators using the private samples.
翻译:研究局部差分隐私背景下离散分布非线性泛函的估计问题。初始数据 $x_1,\ldots,x_n \in [K]$ 被假定为独立同分布,服从未知离散分布 $p = (p_1,\ldots,p_K)$。仅有 $\alpha$-局部差分隐私(LDP)样本 $z_1,...,z_n$ 可公开获取,其中“局部”一词表示每个 $z_i$ 由单个属性 $x_i$ 生成。我们构造了交互式(即允许使用已公开的保密数据)与非交互式的隐私机制(PM)。描述了对于估计幂和泛函 $F_{\gamma} = \sum_{k=1}^K p_k^{\gamma}$, $\gamma >0$ 的二次风险随 $K, \, n$ 和 $\alpha$ 变化的行为。在非交互情形下,针对所有 $\gamma >0$,我们研究了两种 $F_{\gamma}$ 的插件型估计量,其形式与Jiao等人(2017)在多项模型中分析的最大似然估计类似。但由于隐私约束,我们达到的速率更慢,与Collier等人(2020)在高斯模型中获得的速率相似。在交互情形下,针对所有 $\gamma >1$,我们引入两步法,当 $\gamma \geq 2$ 时,该方法达到更快的参数速率 $(n \alpha^2)^{-1/2}$。我们给出了在所有 $\alpha$-LDP机制和所有使用隐私样本的估计量上的下界结果。