Good data stewardship requires removal of data at the request of the data's owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to "remove" data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.
翻译:良好的数据治理要求数据所有者请求时能够移除其数据。这引出一个问题:一个隐含存储了其训练数据信息的已训练机器学习模型,是否应当以及如何受到此类移除请求的影响?是否可能从机器学习模型中"移除"数据?我们通过定义"确证移除"(certified removal)来研究这一问题:这是一种极强的理论保证,使得移除了数据的模型与从未接触过该数据的模型在行为上无法区分。我们为线性分类器开发了一种确证移除机制,并实证研究了该机制在实际应用场景中的可行性。