Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning

Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in estimating and optimizing the utility-privacy trade-off in image datasets, helping to inform data and privacy modifications for better outcomes based on dataset characteristics.

翻译：机器学习（ML）在包括计算机视觉在内的许多领域至关重要。然而，基于敏感数据训练的ML模型面临安全挑战，因为它们可能遭受攻击并泄露信息。隐私保护机器学习（PPML）通过使用差分隐私（DP）来平衡效用与隐私，以解决这一问题。本研究识别了影响私有和非私有卷积神经网络（CNN）模型效用与脆弱性的图像数据集特征。通过分析多个数据集和隐私预算，我们发现不平衡数据集增加了少数类别的脆弱性，但DP缓解了这一问题。类别较少的数据集同时提升了模型效用和隐私性，而高熵或低费舍尔判别比（FDR）的数据集则恶化了效用-隐私权衡。这些见解为从业者和研究人员在评估及优化图像数据集的效用-隐私权衡方面提供了宝贵指导，有助于根据数据集特征调整数据和隐私设置以获得更优结果。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日