Most of the literature on differential privacy considers the item-level case where each user has a single observation, but a growing field of interest is that of user-level privacy where each of the $n$ users holds $T$ observations and wishes to maintain the privacy of their entire collection. In this paper, we derive a general minimax lower bound, which shows that, for locally private user-level estimation problems, the risk cannot, in general, be made to vanish for a fixed number of users even when each user holds an arbitrarily large number of observations. We then derive matching, up to logarithmic factors, lower and upper bounds for univariate and multidimensional mean estimation, sparse mean estimation and non-parametric density estimation. In particular, with other model parameters held fixed, we observe phase transition phenomena in the minimax rates as $T$ the number of observations each user holds varies. In the case of (non-sparse) mean estimation and density estimation, we see that, for $T$ below a phase transition boundary, the rate is the same as having $nT$ users in the item-level setting. Different behaviour is however observed in the case of $s$-sparse $d$-dimensional mean estimation, wherein consistent estimation is impossible when $d$ exceeds the number of observations in the item-level setting, but is possible in the user-level setting when $T \gtrsim s \log (d)$, up to logarithmic factors. This may be of independent interest for applications as an example of a high-dimensional problem that is feasible under local privacy constraints.
翻译:差分隐私文献主要关注项级场景(每个用户仅有一个观测值),但用户级隐私日益成为研究热点——其中每个用户持有$T$个观测值并希望保护其全部数据的隐私。本文推导了通用极小化最大下界,表明在本地隐私约束下的用户级估计问题中,即使每个用户持有任意大量观测值,固定用户数量下的风险通常无法趋于零。我们进一步针对单变量与多变量均值估计、稀疏均值估计以及非参数密度估计,推导了至多相差对数因子的匹配上下界。特别地,在其他模型参数固定时,随着每个用户持有的观测数量$T$变化,极小化最大速率呈现相变现象。对于(非稀疏)均值估计和密度估计,当$T$低于相变阈值时,其估计速率等同于项级场景下拥有$nT$个用户。然而,在$s$稀疏的$d$维均值估计中观察到不同行为:当$d$超过项级场景的观测数量时,项级场景无法实现一致估计,而用户级场景在$T \gtrsim s \log (d)$(至多相差对数因子)时可实现一致估计。这一发现作为高维问题在本地隐私约束下可行的实例,可能具有独立的应用价值。