Generative Adversarial Networks (GAN) have been used in many studies to synthesise mixed tabular data. Conditional tabular GAN (CTGAN) have been the most popular variant but struggle to effectively navigate the risk-utility trade-off. Bayesian GAN have received less attention for tabular data, but have been explored with unstructured data such as images and text. The most used technique employed in Bayesian GAN is Markov Chain Monte Carlo (MCMC), but it is computationally intensive, particularly in terms of weight storage. In this paper, we introduce Gaussian Approximation of CTGAN (GACTGAN), an integration of the Bayesian posterior approximation technique using Stochastic Weight Averaging-Gaussian (SWAG) within the CTGAN generator to synthesise tabular data, reducing computational overhead after the training phase. We demonstrate that GACTGAN yields better synthetic data compared to CTGAN, achieving better preservation of tabular structure and inferential statistics with less privacy risk. These results highlight GACTGAN as a simpler, effective implementation of Bayesian tabular synthesis.
翻译:生成对抗网络(GAN)已在许多研究中被用于合成混合型表格数据。条件表格GAN(CTGAN)是最流行的变体,但难以有效权衡风险与效用。贝叶斯GAN在表格数据领域关注较少,但在非结构化数据(如图像和文本)中已有探索。贝叶斯GAN最常用的技术是马尔可夫链蒙特卡洛(MCMC),但其计算成本高昂,特别是在权重存储方面。本文提出CTGAN的高斯近似方法(GACTGAN),将基于随机权重平均-高斯(SWAG)的贝叶斯后验近似技术集成到CTGAN生成器中,以降低训练阶段后的计算开销。我们证明GACTGAN相比CTGAN能生成更优的合成数据,在降低隐私风险的同时,更好地保持了表格结构和推断统计特性。这些结果表明GACTGAN是实现贝叶斯表格数据合成的一种更简洁有效的方案。