Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context

Financial simulators play an important role in enhancing forecasting accuracy, managing risks, and fostering strategic financial decision-making. Despite the development of financial market simulation methodologies, existing frameworks often struggle with adapting to specialized simulation context. We pinpoint the challenges as i) current financial datasets do not contain context labels; ii) current techniques are not designed to generate financial data with context as control, which demands greater precision compared to other modalities; iii) the inherent difficulties in generating context-aligned, high-fidelity data given the non-stationary, noisy nature of financial data. To address these challenges, our contributions are: i) we proposed the Contextual Market Dataset with market dynamics, stock ticker, and history state as context, leveraging a market dynamics modeling method that combines linear regression and Dynamic Time Warping clustering to extract market dynamics; ii) we present Market-GAN, a novel architecture incorporating a Generative Adversarial Networks (GAN) for the controllable generation with context, an autoencoder for learning low-dimension features, and supervisors for knowledge transfer; iii) we introduce a two-stage training scheme to ensure that Market-GAN captures the intrinsic market distribution with multiple objectives. In the pertaining stage, with the use of the autoencoder and supervisors, we prepare the generator with a better initialization for the adversarial training stage. We propose a set of holistic evaluation metrics that consider alignment, fidelity, data usability on downstream tasks, and market facts. We evaluate Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and showcase superior performance in comparison to 4 state-of-the-art time-series generative models.

翻译：金融模拟器在提升预测精度、管理风险以及促进战略性金融决策中扮演着重要角色。尽管金融市场模拟方法不断发展，现有框架仍难以适应特定模拟场景的需求。我们将挑战归纳为：i) 现有金融数据集缺乏上下文标签；ii) 当前技术无法生成以上下文为控制条件的金融数据，且与其他模态相比精度要求更高；iii) 鉴于金融数据非平稳、高噪声的特性，生成具上下文一致性且高保真度的数据存在固有困难。为解决上述挑战，我们的贡献包括：i) 提出包含市场动态、股票代码和历史状态三类上下文的上下文市场数据集，并采用线性回归与动态时间扭曲聚类相结合的市场动态建模方法提取市场动态；ii) 提出Market-GAN这一新型架构，集成了用于可控上下文生成的生成对抗网络、用于学习低维特征的自编码器以及用于知识迁移的监督器；iii) 引入两阶段训练策略，确保Market-GAN在多目标优化下捕捉金融市场内在分布。在预训练阶段，通过自编码器和监督器实现生成器参数初始化优化，为后续对抗训练奠定基础。我们提出涵盖对齐性、保真度、下游任务数据可用性及市场事实的全方位评估指标体系。基于2000年至2023年道琼斯工业平均指数数据的实验表明，Market-GAN在四项最先进时序生成模型的对比中展现了卓越性能。