We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when $\textit{all}$ of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d}{\alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $\alpha$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.
翻译:我们研究了在个体持有多个样本情况下的个体级差分隐私(DP)均值估计问题。此处的差分隐私要求当个体的所有数据点都可能被修改时,满足通常的分布稳定性概念。非正式地说,如果n个个体各自拥有来自一个具有有界k阶矩的未知d维分布的m个样本,我们证明在ε-差分隐私(及其常见松弛形式)下,需要且仅需要\[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d}{\alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\]个个体,才能在ℓ₂范数下以距离α估计均值。在多变量设置中,我们给出了近似差分隐私下的计算高效算法和纯差分隐私下的计算低效算法,而我们近乎匹配的下界对于最宽松的近似差分隐私情况同样成立。我们的计算高效估计器基于标准的裁剪加噪声框架,但针对本设置的分析需要新的算法技术和新的分析手段。特别地,我们关于独立、向量值、有界矩随机变量和的尾部新界可能具有独立的研究价值。