Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We first show that it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show we can improve utility while reducing overhead.
翻译:在保护个体隐私的同时实现协作数据共享对组织机构至关重要。合成数据生成是一种解决方案,它能生成反映私有数据统计特性的人工数据。尽管已有多种基于差分隐私的技术被提出,但它们大多假设数据是集中式的。然而,数据通常以联邦方式分布在多个客户端之间。本工作首次对联邦合成表格数据生成展开研究。基于一种称为AIM的先进集中式方法,我们提出了DistAIM与FLAIM。我们首先证明AIM的分布式实现是直接的,并扩展了最近一种基于安全多方计算的方法,但该方法需要额外开销,使其不太适用于联邦场景。随后我们指出,在存在异构性的情况下,简单地将AIM联邦化可能导致效用显著下降。为缓解这两个问题,我们提出一种增强的FLAIM方法,该方法维护一个私有化的异构性代理。我们在不同异构程度下的一系列基准数据集上模拟了所提方法,结果表明我们能够在降低开销的同时提升数据效用。