Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We show it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show this can improve utility while reducing overhead.
翻译:保护个体隐私的同时促进协作数据共享对组织至关重要。合成数据生成是一种解决方案,它能生成模拟私有数据统计特性的人工数据。尽管差分隐私领域已开发出多种技术,但这些方法大多假设数据是集中式的。然而,实际场景中数据常以联邦方式分布在多个客户端。本研究首次探索联邦合成表格数据的生成。基于名为AIM的先进集中式方法,我们提出了DistAIM与FLAIM。研究表明,通过扩展近期基于安全多方计算的方法来实现AIM的分布式部署虽可行,但会引入额外开销,不适用于联邦场景。进一步发现,直接联邦化AIM在存在异质性时会导致效用显著下降。为缓解这两个问题,我们提出增强型FLAIM方法,通过维护异质性的私有代理来优化。我们在多个基准数据集上模拟了不同异质性程度下的方法效果,结果表明该方法能在降低开销的同时提升效用。