The availability of rich and vast data sources has greatly advanced machine learning applications in various domains. However, data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles in compliance with privacy considerations is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released, enabling privacy-preserving downstream analysis and reproducible research in sensitive domains. In recent years, various approaches have been proposed for achieving privacy-preserving high-dimensional data generation by private training on top of deep neural networks. In this paper, we present a novel unified view that systematizes these approaches. Our view provides a joint design space for systematically deriving methods that cater to different use cases. We then discuss the strengths, limitations, and inherent correlations between different approaches, aiming to shed light on crucial aspects and inspire future research. We conclude by presenting potential paths forward for the field of DP data generation, with the aim of steering the community toward making the next important steps in advancing privacy-preserving learning.
翻译:丰富且庞大的数据源极大地推动了机器学习在各个领域的应用。然而,涉及隐私问题的数据受到严格法规的约束,常常禁止数据访问和共享。在符合隐私考量前提下克服这些障碍,是许多涉及隐私敏感数据的实际应用场景中技术进步的关键。差分隐私数据发布提供了一种有吸引力的解决方案,它仅公开数据的净化形式,从而能够在敏感领域实现隐私保护的后续分析和可重复研究。近年来,人们提出了多种通过在深度神经网络上进行私有训练来实现隐私保护高维数据生成的方法。本文提出了一种新颖的统一视角,将这些方法系统化。我们的视角提供了一个联合设计空间,可系统性地推导出适用于不同用例的方法。随后,我们讨论了不同方法的优势、局限性及内在关联,旨在阐明关键方面并启发未来研究。最后,我们提出了差分隐私数据生成领域的前进路径,以期引导学术界在推进隐私保护学习方面迈出下一步重要步伐。