Obfuscating a dataset by adding random noises to protect the privacy of sensitive samples in the training dataset is crucial to prevent data leakage to untrusted parties for edge applications. We conduct comprehensive experiments to investigate how the dataset obfuscation can affect the resultant model weights - in terms of the model accuracy, Frobenius-norm (F-norm)-based model distance, and level of data privacy - and discuss the potential applications with the proposed Privacy, Utility, and Distinguishability (PUD)-triangle diagram to visualize the requirement preferences. Our experiments are based on the popular MNIST and CIFAR-10 datasets under both independent and identically distributed (IID) and non-IID settings. Significant results include a trade-off between the model accuracy and privacy level and a trade-off between the model difference and privacy level. The results indicate broad application prospects for training outsourcing in edge computing and guarding against attacks in Federated Learning among edge devices.
翻译:通过向训练数据集中添加随机噪声进行数据混淆,对于保护边缘应用中敏感样本的隐私、防止数据泄露给不可信方至关重要。我们开展了全面的实验,研究数据混淆如何影响最终模型权重——从模型准确率、基于Frobenius范数(F-范数)的模型距离以及数据隐私等级三个维度进行探讨,并利用所提出的隐私、效用与可区分性(PUD)三角图可视化需求偏好,分析潜在应用场景。实验基于主流的MNIST和CIFAR-10数据集,在独立同分布(IID)与非独立同分布(non-IID)两种设置下开展。重要结果表明,模型准确率与隐私等级之间存在权衡关系,模型差异度与隐私等级之间亦存在权衡关系。这一结果表明,该技术在边缘计算中的训练外包以及边缘设备间的联邦学习攻击防御方面具有广阔应用前景。