Previous work on user-level differential privacy (DP) [Ghazi et al. NeurIPS 2021, Bun et al. STOC 2023] obtained generic algorithms that work for various learning tasks. However, their focus was on the example-rich regime, where the users have so many examples that each user could themselves solve the problem. In this work we consider the example-scarce regime, where each user has only a few examples, and obtain the following results: 1. For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. Roughly speaking, the latter gives a (multiplicative) savings of $O_{\varepsilon,\delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm, while recovering most known bounds for specific problems, also gives new bounds, e.g., for PAC learning. 2. For pure-DP, we present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting. This gives new bounds for a variety of tasks, such as private PAC learning, hypothesis selection, and distribution learning. For some of these problems, we show that our bounds are near-optimal.
翻译:先前关于用户级差分隐私(DP)的研究[Ghazi等 NeurIPS 2021, Bun等 STOC 2023]获得了适用于多种学习任务的通用算法。然而,这些工作聚焦于样本丰富的场景,即用户拥有足够多样本以至于每个用户自身就能解决问题。本文考虑样本稀缺场景,即每个用户仅有少量样本,并取得以下结果:1. 针对近似差分隐私,我们给出一种将任意条目级DP算法转换为用户级DP算法的通用方法。粗略而言,该方法在实现相同效用所需的用户数量上实现了$O_{\varepsilon,\delta}(\sqrt{m})$的(乘性)节省,其中$m$为每位用户的样本数。该算法不仅恢复了多数已知问题的最优界,还给出了新的结果,例如用于PAC学习。2. 针对纯差分隐私,我们提出一种将指数机制[McSherry, Talwar FOCS 2007]适配到用户级场景的简单方法。这为多种任务(如私有PAC学习、假设检验和分布学习)给出了新界限。在部分问题中,我们证明这些界限接近最优。