Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control

Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, all of them have critical drawbacks. For example, creating a transformed data set using traditional techniques is highly time-consuming. Also, recent deep learning-based solutions require significant computational resources in addition to long training phases, and differentially private-based solutions may undermine data utility. In this paper, we propose $\epsilon$-PrivateSMOTE, a technique designed for safeguarding against re-identification and linkage attacks, particularly addressing cases with a high re-identification risk. Our proposal combines synthetic data generation via noise-induced interpolation to obfuscate high-risk cases while maximising the data utility of the original data. Compared to multiple traditional and state-of-the-art privacy-preservation methods on 17 data sets, $\epsilon$-PrivateSMOTE achieves competitive results in privacy risk and better predictive performance than generative adversarial networks, variational autoencoders, and differential privacy baselines. It also improves energy consumption and time requirements by at least a factor of 11 and 15, respectively.

翻译：保护用户数据隐私可通过多种方法实现，从统计变换到生成模型。然而，这些方法都存在关键缺陷。例如，使用传统技术创建变换后的数据集非常耗时。此外，近期基于深度学习的解决方案除了需要较长的训练阶段外，还要求大量计算资源，而基于差分隐私的方法可能损害数据效用。本文提出 $\epsilon$-PrivateSMOTE技术，旨在防范重识别和链接攻击，尤其针对重识别风险较高的情况。该方法通过噪声诱导插值生成合成数据以混淆高风险案例，同时最大化原始数据的效用。在17个数据集上与多种传统及最新隐私保护方法相比，$\epsilon$-PrivateSMOTE在隐私风险方面取得了具有竞争力的结果，且在预测性能上优于生成对抗网络、变分自编码器和差分隐私基线方法。该方法还将能耗和时间需求分别降低了至少11倍和15倍。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日