We study the problem of multi-task learning under user-level differential privacy, in which $n$ users contribute data to $m$ tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy. It is natural to ask whether algorithms can adapt to this skew to improve the overall utility. We give a systematic analysis of the problem, by studying how to optimally allocate a user's privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that when there is task distribution skew, this gives a quantifiable improvement of excess empirical risk. Experimental studies on recommendation problems that exhibit a long tail of small tasks, demonstrate that our methods significantly improve utility, achieving the state of the art on two standard benchmarks.
翻译:我们研究了用户级差分隐私约束下的多任务学习问题,其中$n$个用户为$m$个任务贡献数据,每个任务涉及部分用户。该问题中一个显著影响质量的关键因素是任务间的分布偏斜:某些任务的数据样本可能远少于其他任务,导致其对隐私保护添加的噪声更为敏感。自然产生的疑问是:算法能否自适应这种偏斜以提升整体效用?我们通过研究如何最优分配用户隐私预算至各任务,对此问题进行了系统性分析。提出了一种基于经验损失自适应重加权的通用算法,证明当存在任务分布偏斜时,该算法可量化提升超额经验风险。在呈现长尾小任务特征的推荐系统实验研究中,我们的方法显著改善了效用,在两个标准基准测试中达到了业界最优水平。