Deep Generative Models for Synthetic Financial Data: Applications to Portfolio and Risk Modeling

from arxiv, 14 pages, submitted as a preprint. This study examines generative models, specifically Time-series Generative Adversarial Networks (TimeGAN) and Variational Autoencoders (VAEs) for creating synthetic financial data to support portfolio construction, trading analysis, and risk modeling

Synthetic financial data provides a practical solution to the privacy, accessibility, and reproducibility challenges that often constrain empirical research in quantitative finance. This paper investigates the use of deep generative models, specifically Time-series Generative Adversarial Networks (TimeGAN) and Variational Autoencoders (VAEs) to generate realistic synthetic financial return series for portfolio construction and risk modeling applications. Using historical daily returns from the S and P 500 as a benchmark, we generate synthetic datasets under comparable market conditions and evaluate them using statistical similarity metrics, temporal structure tests, and downstream financial tasks. The study shows that TimeGAN produces synthetic data with distributional shapes, volatility patterns, and autocorrelation behaviour that are close to those observed in real returns. When applied to mean--variance portfolio optimization, the resulting synthetic datasets lead to portfolio weights, Sharpe ratios, and risk levels that remain close to those obtained from real data. The VAE provides more stable training but tends to smooth extreme market movements, which affects risk estimation. Finally, the analysis supports the use of synthetic datasets as substitutes for real financial data in portfolio analysis and risk simulation, particularly when models are able to capture temporal dynamics. Synthetic data therefore provides a privacy-preserving, cost-effective, and reproducible tool for financial experimentation and model development.

翻译：合成金融数据为解决量化金融实证研究中常见的隐私性、可访问性和可复现性挑战提供了实用方案。本文研究了深度生成模型——特别是时序生成对抗网络（TimeGAN）与变分自编码器（VAE）——在生成用于投资组合构建与风险建模应用的逼真合成金融收益率序列方面的应用。我们以标普500指数的历史日收益率作为基准，在可比市场条件下生成合成数据集，并通过统计相似性度量、时序结构检验及下游金融任务对其进行评估。研究表明，TimeGAN生成的合成数据在分布形态、波动率模式与自相关行为方面均接近真实收益率序列。将其应用于均值-方差投资组合优化时，所得合成数据集产生的投资组合权重、夏普比率和风险水平均与基于真实数据得到的结果相近。VAE虽能提供更稳定的训练过程，但倾向于平滑极端市场波动，从而影响风险估计。最终，分析支持将合成数据集作为投资组合分析与风险模拟中真实金融数据的替代品，尤其是在模型能够捕捉时序动态特征的情况下。因此，合成数据为金融实验与模型开发提供了一种兼具隐私保护、成本效益与可复现性的工具。