As public consciousness regarding the collection and use of personal information by corporations grows, it is of increasing importance that consumers be active participants in the curation of corporate datasets. In light of this, data governance frameworks such as the General Data Protection Regulation (GDPR) have outlined the right to be forgotten as a key principle allowing individuals to request that their personal data be deleted from the databases and models used by organizations. To achieve forgetting in practice, several machine unlearning methods have been proposed to address the computational inefficiencies of retraining a model from scratch with each unlearning request. While efficient online alternatives to retraining, it is unclear how these methods impact other properties critical to real-world applications, such as fairness. In this work, we propose the first fair machine unlearning method that can provably and efficiently unlearn data instances while preserving group fairness. We derive theoretical results which demonstrate that our method can provably unlearn data instances while maintaining fairness objectives. Extensive experimentation with real-world datasets highlight the efficacy of our method at unlearning data instances while preserving fairness.
翻译:随着公众对企业收集和使用个人信息意识的增强,消费者积极参与企业数据集的筛选变得越来越重要。鉴于此,诸如《通用数据保护条例》(GDPR)之类的数据治理框架已将“被遗忘权”列为关键原则,允许个人要求将其个人数据从组织使用的数据库和模型中删除。为了在实践中实现遗忘,已提出多种机器遗忘方法,以解决每次遗忘请求时从头开始重新训练模型的计算效率低下问题。尽管这些方法作为重新训练的高效在线替代方案,但它们如何影响现实应用中的其他关键属性(如公平性)尚不清楚。在这项工作中,我们提出了首个公平机器遗忘方法,该方法能够在保留群体公平性的同时,可证明且高效地遗忘数据实例。我们推导了理论结果,证明我们的方法能够在维护公平目标的同时可证明地遗忘数据实例。在真实数据集上的大量实验表明,我们的方法在删除数据实例的同时保持公平性方面具有有效性。