Motivated by the recent empirical success of incorporating public data into differentially private learning, we theoretically investigate how a shared representation learned from public data can improve private learning. We explore two common scenarios of transfer learning for linear regression, both of which assume the public and private tasks (regression vectors) share a low-rank subspace in a high-dimensional space. In the first single-task transfer scenario, the goal is to learn a single model shared across all users, each corresponding to a row in a dataset. We provide matching upper and lower bounds showing that our algorithm achieves the optimal excess risk within a natural class of algorithms that search for the linear model within the given subspace estimate. In the second scenario of multitask model personalization, we show that with sufficient public data, users can avoid private coordination, as purely local learning within the given subspace achieves the same utility. Taken together, our results help to characterize the benefits of public data across common regimes of private transfer learning.
翻译:受近期将公共数据融入差分隐私学习取得实证成功的启发,我们从理论上探讨了从公共数据中学习的共享表示如何改进私有学习。我们研究了线性回归迁移学习的两种常见场景,两者均假设公共任务与私有任务(回归向量)共享高维空间中的低秩子空间。在第一种单任务迁移场景中,目标是学习一个跨所有用户(每个用户对应数据集中的一行)共享的单一模型。我们给出了匹配的上下界,证明我们的算法在给定子空间估计内搜索线性模型的自然算法类中实现了最优超额风险。在第二种多任务模型个性化场景中,我们证明当公共数据充足时,用户可避免私有协调,因为在给定子空间内进行纯本地学习即可达到相同效用。综合来看,我们的结果有助于刻画公共数据在常见私有迁移学习场景中的优势。