Learning algorithms and data are the driving forces for machine learning to bring about tremendous transformation of industrial intelligence. However, individuals' right to retract their personal data and relevant data privacy regulations pose great challenges to machine learning: how to design an efficient mechanism to support certified data removals. Removal of previously seen data known as machine unlearning is challenging as these data points were implicitly memorized in training process of learning algorithms. Retraining remaining data from scratch straightforwardly serves such deletion requests, however, this naive method is not often computationally feasible. We propose the unlearning scheme random relabeling, which is applicable to generic supervised learning algorithms, to efficiently deal with sequential data removal requests in the online setting. A less constraining removal certification method based on probability distribution similarity with naive unlearning is further developed for logit-based classifiers.
翻译:学习算法与数据是推动机器学习实现工业智能巨大变革的核心驱动力。然而,个人撤回其数据权利的要求及相关数据隐私法规给机器学习带来了严峻挑战:如何设计高效机制支持认证数据删除。先前数据点的移除(即机器遗忘)颇具挑战性,因为这些数据点已在学习算法训练过程中被隐式记忆。从头对保留数据重新训练可直接满足此类删除请求,但这种朴素方法在计算上通常不可行。我们提出适用于通用监督学习算法的遗忘方案——随机重标记,以高效处理在线场景中的连续数据移除请求。针对基于对数几率输出的分类器,进一步开发了一种基于概率分布相似性的认证方法,该方法相比朴素遗忘具有更低的约束性。