Users are more aware than ever of the importance of their own data, thanks to reports about security breaches and leaks of private, often sensitive data in recent years. Additionally, the GDPR has been in effect in the European Union for over three years and many people have encountered its effects in one way or another. Consequently, more and more users are actively protecting their personal data. One way to do this is to make of the right to erasure guaranteed in the GDPR, which has potential implications for a number of different fields, such as big data and machine learning. Our paper presents an in-depth analysis about the impact of the use of the right to erasure on the performance of machine learning models on classification tasks. We conduct various experiments utilising different datasets as well as different machine learning algorithms to analyse a variety of deletion behaviour scenarios. Due to the lack of credible data on actual user behaviour, we make reasonable assumptions for various deletion modes and biases and provide insight into the effects of different plausible scenarios for right to erasure usage on data quality of machine learning. Our results show that the impact depends strongly on the amount of data deleted, the particular characteristics of the dataset and the bias chosen for deletion and assumptions on user behaviour.
翻译:近年来,得益于关于私密且常为敏感数据的安全漏洞和泄露报告,用户比以往更加意识到自身数据的重要性。此外,欧盟的《通用数据保护条例》已生效超过三年,许多人以各种方式感受到了其影响。因此,越来越多的用户积极保护个人数据。其中一种方式是利用GDPR中保障的删除权,这对大数据和机器学习等多个领域具有潜在影响。本文深入分析了使用删除权对机器学习模型在分类任务中性能的影响。我们使用不同数据集和机器学习算法进行了多项实验,以分析各种删除行为场景。由于缺乏关于实际用户行为的可信数据,我们对多种删除模式和偏差做出了合理假设,并深入探讨了不同可能的删除权使用场景对机器学习数据质量的影响。我们的结果表明,影响强烈依赖于删除的数据量、数据集的具体特征、所选删除偏差以及用户行为假设。