Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control

Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, all of them have critical drawbacks. For example, creating a transformed data set using traditional techniques is highly time-consuming. Also, recent deep learning-based solutions require significant computational resources in addition to long training phases, and differentially private-based solutions may undermine data utility. In this paper, we propose $\epsilon$-PrivateSMOTE, a technique designed for safeguarding against re-identification and linkage attacks, particularly addressing cases with a high \sloppy re-identification risk. Our proposal combines synthetic data generation via noise-induced interpolation with differential privacy principles to obfuscate high-risk cases. We demonstrate how $\epsilon$-PrivateSMOTE is capable of achieving competitive results in privacy risk and better predictive performance when compared to multiple traditional and state-of-the-art privacy-preservation methods, including generative adversarial networks, variational autoencoders, and differential privacy baselines. We also show how our method improves time requirements by at least a factor of 9 and is a resource-efficient solution that ensures high performance without specialised hardware.

翻译：保护用户数据隐私可通过多种方法实现，从统计变换到生成模型皆可行。然而，所有方法均存在关键缺陷。例如，使用传统技术创建变换数据集非常耗时；近期基于深度学习的方案除需较长训练阶段外，还需耗费大量计算资源；而基于差分隐私的方案可能损害数据效用。本文提出$\epsilon$-PrivateSMOTE技术，旨在防御重识别攻击与链接攻击，尤其针对重识别风险较高的情况。该方案通过噪声诱导插值生成合成数据，并结合差分隐私原理对高风险案例进行混淆处理。我们证明，相较于包括生成对抗网络、变分自编码器及差分隐私基线在内的多种传统及最新隐私保护方法，$\epsilon$-PrivateSMOTE能在隐私风险控制方面取得具有竞争力的结果，同时具备更优的预测性能。实验还表明，该方法将时间需求至少降低9倍，是一种无需专用硬件即可确保高性能的资源高效解决方案。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日