The banking sector faces challenges in using deep learning due to data sensitivity and regulatory constraints, but generative AI may offer a solution. Thus, this study identifies effective algorithms for generating synthetic financial transaction data and evaluates five leading models - Conditional Tabular Generative Adversarial Networks (CTGAN), DoppelGANger (DGAN), Wasserstein GAN, Financial Diffusion (FinDiff), and Tabular Variational AutoEncoders (TVAE) - across five criteria: fidelity, synthesis quality, efficiency, privacy, and graph structure. While none of the algorithms is able to replicate the real data's graph structure, each excels in specific areas: DGAN is ideal for privacy-sensitive tasks, FinDiff and TVAE excel in data replication and augmentation, and CTGAN achieves a balance across all five criteria, making it suitable for general applications with moderate privacy concerns. As a result, our findings offer valuable insights for choosing the most suitable algorithm.
翻译:银行业因数据敏感性和监管限制而在使用深度学习方面面临挑战,但生成式人工智能可能提供解决方案。因此,本研究旨在识别生成合成金融交易数据的有效算法,并从五个标准——保真度、合成质量、效率、隐私性和图结构——对五种领先模型进行评估:条件表格生成对抗网络(CTGAN)、DoppelGANger(DGAN)、Wasserstein GAN、金融扩散模型(FinDiff)以及表格变分自编码器(TVAE)。尽管没有一种算法能够完全复现真实数据的图结构,但每种算法在特定领域表现出色:DGAN 适用于对隐私敏感的任务,FinDiff 和 TVAE 在数据复现与增强方面表现优异,而 CTGAN 在全部五个标准上取得了平衡,使其适用于对隐私有适度关注的一般性应用。因此,我们的研究结果为选择最合适的算法提供了有价值的见解。