Most of the literature on differential privacy considers the item-level case where each user has a single observation, but a growing field of interest is that of user-level privacy where each of the $n$ users holds $T$ observations and wishes to maintain the privacy of their entire collection. In this paper, we derive a general minimax lower bound, which shows that, for locally private user-level estimation problems, the risk cannot, in general, be made to vanish for a fixed number of users even when each user holds an arbitrarily large number of observations. We then derive matching, up to logarithmic factors, lower and upper bounds for univariate and multidimensional mean estimation, sparse mean estimation and non-parametric density estimation. In particular, with other model parameters held fixed, we observe phase transition phenomena in the minimax rates as $T$ the number of observations each user holds varies. In the case of (non-sparse) mean estimation and density estimation, we see that, for $T$ below a phase transition boundary, the rate is the same as having $nT$ users in the item-level setting. Different behaviour is however observed in the case of $s$-sparse $d$-dimensional mean estimation, wherein consistent estimation is impossible when $d$ exceeds the number of observations in the item-level setting, but is possible in the user-level setting when $T \gtrsim s \log (d)$, up to logarithmic factors. This may be of independent interest for applications as an example of a high-dimensional problem that is feasible under local privacy constraints.
翻译:差分隐私文献大多关注项目级情形,即每个用户仅持有单次观测,但一个日益受到关注的领域是用户级隐私,其中$n$个用户各持有$T$次观测,并希望保护其全部数据集的隐私。本文推导了一个广义极小极大下界,表明对于局部隐私下的用户级估计问题,即使每个用户持有任意多次观测,当用户数量固定时,风险通常也无法趋近于零。随后我们针对单变量与多维均值估计、稀疏均值估计以及非参数密度估计问题,推导了对数因子意义下匹配的上下界。特别地,当其他模型参数固定时,我们观察到极小极大速率随每个用户观测数$T$变化产生的相变现象。在(非稀疏)均值估计与密度估计问题中,当$T$低于相变边界时,其速率与项目级设定下具有$nT$个用户的情形相同。然而在$s$稀疏$d$维均值估计问题中,我们观察到不同的行为:当$d$超过项目级设定中的观测数量时,一致估计是不可能的;但在用户级设定中,当$T \gtrsim s \log (d)$时(忽略对数因子),估计是可行的。这一现象作为局部隐私约束下可行的高维问题实例,可能对实际应用具有独立意义。