Suppose we want to train text prediction models in email clients or word processors. The models must preserve the privacy of user data and adhere to a specific fixed size to meet memory and inference time requirements. We introduce a generic framework to solve this problem. Specifically, we are given a public dataset $D_\text{pub}$ and a private dataset $D_\text{priv}$ corresponding to a downstream task $T$. How should we pre-train a fixed-size model $M$ on $D_\text{pub}$ and fine-tune it on $D_\text{priv}$ such that performance of $M$ with respect to $T$ is maximized and $M$ satisfies differential privacy with respect to $D_\text{priv}$? We show that pre-training on a {\em subset} of dataset $D_\text{pub}$ that brings the public distribution closer to the private distribution is a crucial ingredient to maximize the transfer learning abilities of $M$ after pre-training, especially in the regimes where model sizes are relatively small. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, {\em smaller models} can match the performance of much larger models, highlighting the promise of differentially private training as a tool for model compression and efficiency.
翻译:假设我们希望在电子邮件客户端或文字处理器中训练文本预测模型。这些模型必须保护用户数据的隐私,并遵循特定固定大小以满足内存和推理时间要求。我们引入了一个通用框架来解决此问题。具体而言,给定一个公开数据集 $D_\text{pub}$ 和一个与下游任务 $T$ 对应的私有数据集 $D_\text{priv}$,我们应该如何在 $D_\text{pub}$ 上预训练固定大小的模型 $M$,并在 $D_\text{priv}$ 上微调它,使得 $M$ 在任务 $T$ 上的性能最大化,同时 $M$ 对 $D_\text{priv}$ 满足差分隐私?我们表明,在数据集 $D_\text{pub}$ 的一个{\em 子集}上进行预训练(该子集使公开分布更接近私有分布)是最大化 $M$ 在预训练后迁移学习能力的关键因素,尤其是在模型规模相对较小的情形下。除了性能提升外,我们的框架还表明,通过精心设计的预训练和私有微调,{\em 较小的模型}能够匹敌更大模型的性能,凸显了差分隐私训练作为模型压缩和效率工具的前景。