To address increasing societal concerns regarding privacy and climate, the EU adopted the General Data Protection Regulation (GDPR) and committed to the Green Deal. Considerable research studied the energy efficiency of software and the accuracy of machine learning models trained on anonymised data sets. Recent work began exploring the impact of privacy-enhancing techniques (PET) on both the energy consumption and accuracy of the machine learning models, focusing on k-anonymity. As synthetic data is becoming an increasingly popular PET, this paper analyses the energy consumption and accuracy of two phases: a) applying privacy-enhancing techniques to the concerned data set, b) training the models on the concerned privacy-enhanced data set. We use two privacy-enhancing techniques: k-anonymisation (using generalisation and suppression) and synthetic data, and three machine-learning models. Each model is trained on each privacy-enhanced data set. Our results show that models trained on k-anonymised data consume less energy than models trained on the original data, with a similar performance regarding accuracy. Models trained on synthetic data have a similar energy consumption and a similar to lower accuracy compared to models trained on the original data.
翻译:为应对社会对隐私和气候问题日益增长的关切,欧盟颁布了《通用数据保护条例》(GDPR)并承诺实施绿色新政。已有大量研究关注软件的能效以及基于匿名化数据集训练的机器学习模型精度。近期工作开始探索隐私增强技术(PET)对机器学习模型能耗和精度的双重影响,研究重点集中于k-匿名化技术。鉴于合成数据正成为日益流行的隐私增强技术,本文分析以下两个阶段的能耗与精度:a) 对所涉数据集应用隐私增强技术;b) 在所涉隐私增强数据集上训练模型。我们采用两种隐私增强技术:k-匿名化(基于泛化和抑制技术)与合成数据,并构建三种机器学习模型。每种模型均在每个隐私增强数据集上进行训练。结果表明:基于k-匿名化数据训练的模型相比原始数据训练的模型能耗更低,且精度表现相近;基于合成数据训练的模型能耗与原始数据训练模型相近,但精度表现介于相近至略低水平。