Machine unlearning is motivated by desire for data autonomy: a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that, counter-intuitively, these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entirety, even when the original models are so simple that privacy risk might not otherwise have been a concern. We show how to mount a near-perfect attack on the deleted data point from linear regression models. We then generalize our attack to other loss functions and architectures, and empirically demonstrate the effectiveness of our attacks across a wide range of datasets (capturing both tabular and image data). Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model.
翻译:机器学习反学习的动机源于数据自主权需求:个体可要求将其数据对已部署模型的影响完全移除,模型应更新至如同从未使用该数据重新训练的状态。本研究发现,与直觉相反,此类更新操作会使个体面临高精度重构攻击的风险——攻击者能完整复原被删除数据,即使原始模型结构简单到通常不会引发隐私担忧。我们首先展示了如何对线性回归模型实施接近完美的被删数据点重构攻击,随后将攻击方法推广至其他损失函数与模型架构,并通过多领域数据集(涵盖表格数据与图像数据)的实验验证了攻击的普适性。本研究揭示:当个体行使数据删除权时,即使对于极度简单的模型类别,隐私风险依然不可忽视。