Lossy Compression of Noisy Data for Private and Data-Efficient Learning - 专知论文

会员服务 ·

0

Learning · 可约的 · Storage · 训练数据 · 噪声 ·

2023 年 3 月 20 日

Lossy Compression of Noisy Data for Private and Data-Efficient Learning

翻译：面向隐私保护与数据高效学习的含噪数据有损压缩

Berivan Isik,Tsachy Weissman

from arxiv, Published at the IEEE Journal on Selected Areas in Information Theory (JSAIT). Preliminary version was presented at the IEEE International Symposium on Information Theory (ISIT), 2022, with a slightly different title, "Learning under Storage and Privacy Constraints."

Storage-efficient privacy-preserving learning is crucial due to increasing amounts of sensitive user data required for modern learning tasks. We propose a framework for reducing the storage cost of user data while at the same time providing privacy guarantees, without essential loss in the utility of the data for learning. Our method comprises noise injection followed by lossy compression. We show that, when appropriately matching the lossy compression to the distribution of the added noise, the compressed examples converge, in distribution, to that of the noise-free training data as the sample size of the training data (or the dimension of the training data) increases. In this sense, the utility of the data for learning is essentially maintained, while reducing storage and privacy leakage by quantifiable amounts. We present experimental results on the CelebA dataset for gender classification and find that our suggested pipeline delivers in practice on the promise of the theory: the individuals in the images are unrecognizable (or less recognizable, depending on the noise level), overall storage of the data is substantially reduced, with no essential loss (and in some cases a slight boost) to the classification accuracy. As an added bonus, our experiments suggest that our method yields a substantial boost to robustness in the face of adversarial test data.

翻译：存储高效的隐私保护学习至关重要，这是因为现代学习任务需要处理日益增长的敏感用户数据。我们提出了一种框架，在降低用户数据存储成本的同时提供隐私保障，且不显著损失数据的学习效用。该方法包含噪声注入与有损压缩两个步骤。我们证明，当有损压缩与所加噪声的分布适当匹配时，随着训练数据样本量（或训练数据维度）的增大，压缩后的样本在分布上收敛于无噪声训练数据分布。在此意义上，数据的学习效用基本得以保持，同时存储开销与隐私泄露风险均实现了可量化降低。我们在CelebA数据集上开展了性别分类实验，结果表明所提流水线切实实现了理论承诺：图像中的人脸无法辨识（或根据噪声水平而辨识度降低），整体数据存储量大幅减少，而分类准确率未出现本质损失（某些情况下甚至略有提升）。此外，实验还表明该方法能显著增强模型对抗对抗性测试数据的鲁棒性。

0

相关内容

Learning

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

46+阅读 · 2020年10月31日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【中科院计算所】迁移学习全面综述论文，A Comprehensive Survey on Transfer Learning，27页pdf，171篇参考文献

【中科院计算所】迁移学习全面综述论文，A Comprehensive Survey on Transfer Learning，27页pdf，171篇参考文献

专知会员服务

99+阅读 · 2019年11月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

1+阅读 · 2022年6月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

面向异分布数据的主动学习方法

国家自然科学基金

12+阅读 · 2015年12月31日

融合稀疏表示与深度学习的图像分类技术研究

国家自然科学基金

7+阅读 · 2013年12月31日

循环let-7e介导的CD4+T细胞和内皮细胞间通讯在ox-LDL致内皮损伤中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于用户模型的移动设备可用性评估方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于压缩网络编码的无线传感网隐私与安全机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

保护隐私的海量数据挖掘

国家自然科学基金

0+阅读 · 2012年12月31日

用于交互式视频检索的教练式主动学习模型

国家自然科学基金

0+阅读 · 2012年12月31日

整数值时间序列数据的建模方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于图数据的在线社交网络隐私无泄露信息发布研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于安全多方计算的数据挖掘隐私保护研究

国家自然科学基金

5+阅读 · 2008年12月31日

A Comprehensive Survey on Model Quantization for Deep Neural Networks

Arxiv

0+阅读 · 2023年5月12日

A Comprehensive Analysis of Adapter Efficiency

Arxiv

0+阅读 · 2023年5月12日

Data-Efficient Contrastive Self-supervised Learning: Easy Examples Contribute the Most

Arxiv

0+阅读 · 2023年5月11日

An Efficient Transformer Decoder with Compressed Sub-layers

Arxiv

0+阅读 · 2023年5月11日

Testing for Overfitting

Arxiv

0+阅读 · 2023年5月9日

Ranking & Reweighting Improves Group Distributional Robustness

Arxiv

0+阅读 · 2023年5月9日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

A Survey on the Explainability of Supervised Machine Learning

Arxiv

24+阅读 · 2020年11月16日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

67+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

最新内容

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

1+阅读 · 今天16:54

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

1+阅读 · 今天16:52

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

6+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

5+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

4+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

6+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

46+阅读 · 2020年10月31日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【中科院计算所】迁移学习全面综述论文，A Comprehensive Survey on Transfer Learning，27页pdf，171篇参考文献

【中科院计算所】迁移学习全面综述论文，A Comprehensive Survey on Transfer Learning，27页pdf，171篇参考文献

专知会员服务

99+阅读 · 2019年11月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

Agentic RL：框架、实践与长程智能体训练

重新思考无人机时代的生存能力

综述 | 从问答到任务完成：Agent系统与Harness设计

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

相关资讯

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

1+阅读 · 2022年6月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

A Comprehensive Survey on Model Quantization for Deep Neural Networks

Arxiv

0+阅读 · 2023年5月12日

A Comprehensive Analysis of Adapter Efficiency

Arxiv

0+阅读 · 2023年5月12日

Data-Efficient Contrastive Self-supervised Learning: Easy Examples Contribute the Most

Arxiv

0+阅读 · 2023年5月11日

An Efficient Transformer Decoder with Compressed Sub-layers

Arxiv

0+阅读 · 2023年5月11日

Testing for Overfitting

Arxiv

0+阅读 · 2023年5月9日

Ranking & Reweighting Improves Group Distributional Robustness

Arxiv

0+阅读 · 2023年5月9日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

A Survey on the Explainability of Supervised Machine Learning

Arxiv

24+阅读 · 2020年11月16日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

67+阅读 · 2019年9月8日

相关基金

面向异分布数据的主动学习方法

国家自然科学基金

12+阅读 · 2015年12月31日

融合稀疏表示与深度学习的图像分类技术研究

国家自然科学基金

7+阅读 · 2013年12月31日

循环let-7e介导的CD4+T细胞和内皮细胞间通讯在ox-LDL致内皮损伤中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于用户模型的移动设备可用性评估方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于压缩网络编码的无线传感网隐私与安全机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

保护隐私的海量数据挖掘

国家自然科学基金

0+阅读 · 2012年12月31日

用于交互式视频检索的教练式主动学习模型

国家自然科学基金

0+阅读 · 2012年12月31日

整数值时间序列数据的建模方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于图数据的在线社交网络隐私无泄露信息发布研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于安全多方计算的数据挖掘隐私保护研究

国家自然科学基金

5+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员