Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model trained for one task to assist with building a finetuned model for a related task. This paradigm has been especially popular for $\textit{privacy}$ in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack, $\textbf{TMI}$, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate $\textbf{TMI}$ on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that $\textbf{TMI}$ can successfully infer membership of pretraining examples using query access to the finetuned model. An open-source implementation of $\textbf{TMI}$ can be found $\href{https://github.com/johnmath/tmi-pets24}{\text{on GitHub}}$.
翻译:迁移学习已成为机器学习中日益流行的技术,它利用为某一任务训练的预训练模型来辅助构建相关任务的微调模型。这种范式在机器学习的$\textit{隐私}$领域尤为常见,其中预训练模型被视为公开的,只有微调数据被视为敏感。然而,有理由认为预训练所使用的数据仍然敏感,因此理解微调模型泄露多少关于预训练数据的信息至关重要。在这项工作中,我们提出了一种新的成员推断威胁模型,其中攻击者仅能访问微调模型,并希望推断预训练数据的成员关系。为实现这一威胁模型,我们实现了一种新颖的基于元分类器的攻击方法$\textbf{TMI}$,该方法利用记忆的预训练样本对下游任务预测的影响。我们在包括差分隐私微调在内的多种迁移学习设置下,对视觉和自然语言任务评估了$\textbf{TMI}$。通过评估,我们发现$\textbf{TMI}$可以利用对微调模型的查询访问成功推断预训练样本的成员关系。$\textbf{TMI}$的开源实现可在$\href{https://github.com/johnmath/tmi-pets24}{\text{GitHub}}$上获取。