The generation of synthetic tabular data that preserves differential privacy is a problem of growing importance. While traditional marginal-based methods have achieved impressive results, recent work has shown that deep learning-based approaches tend to lag behind. In this work, we present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy and achieves performance competitive with marginal-based methods on a wide variety of datasets, capable of even outperforming state-of-the-art methods in certain settings. We also provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most. These results suggest that deep learning-based techniques should be considered as a viable alternative to marginal-based methods in the generation of differentially private synthetic tabular data.
翻译:生成保持差分隐私的合成表格数据是一个日益重要的问题。尽管传统的基于边际的方法已取得显著成果,但近期研究表明基于深度学习的方法往往表现落后。本文提出了差分隐私表格自回归Transformer(DP-TBART),这是一种维持差分隐私且性能可与基于边际方法相媲美的自回归模型,在多种数据集上甚至能在特定场景下超越现有最优方法。我们还提供了理论框架以理解基于边际方法的局限性及深度学习方法的潜在贡献优势。这些结果表明,在生成差分隐私合成表格数据时,深度学习技术应被视为基于边际方法的可行替代方案。