Data is the lifeblood of AI, yet much of the most valuable data remains locked in silos due to privacy and regulations. As a result, AI remains heavily underutilized in many of the most important domains, including healthcare, education, and finance. Synthetic data generation (SDG), i.e. the generation of artificial data with a synthesizer trained on real data, offers an appealing solution to make data available while mitigating privacy concerns, however existing SDG-as-a-service workflow require data holders to trust providers with access to private data. We propose FHAIM, the first fully homomorphic encryption (FHE) framework for training a marginal-based synthetic data generator on encrypted tabular data. FHAIM adapts the widely used AIM algorithm to the FHE setting using novel FHE protocols, ensuring that the private data remains encrypted throughout and is released only with differential privacy guarantees. Our empirical analysis show that FHAIM preserves the performance of AIM while maintaining feasible runtimes.
翻译:数据是人工智能的生命线,然而大量最具价值的数据因隐私和法规限制仍被封闭在孤岛中。因此,人工智能在医疗、教育和金融等诸多关键领域的应用仍严重不足。合成数据生成(SDG)——即通过基于真实数据训练的合成器生成人工数据——为在缓解隐私顾虑的同时实现数据可用性提供了一种颇具吸引力的解决方案。然而,现有的SDG即服务工作流程要求数据持有者信任服务提供商能够访问其私有数据。我们提出FHAIM,这是首个基于全同态加密(FHE)的框架,用于在加密的表格数据上训练基于边际分布的合成数据生成器。FHAIM通过新颖的FHE协议将广泛使用的AIM算法适配至FHE环境,确保私有数据全程保持加密状态,且仅以差分隐私保证的形式发布。我们的实证分析表明,FHAIM在保持可行运行时间的同时,完整保留了AIM的性能。