Generating synthetic financial time series that preserve the statistical properties of real market data is essential for stress testing, risk model validation, and scenario design. Existing approaches struggle to simultaneously reproduce heavy-tailed distributions, negligible linear autocorrelation, and persistent volatility clustering. We developed a hybrid hidden Markov framework that discretized excess growth rates into Laplace quantile-defined states and augmented regime switching with a Poisson jump-duration mechanism to enforce realistic tail-state dwell times. Parameters were estimated by direct transition counting, bypassing the Baum-Welch EM algorithm and scaling to a 424-asset pipeline. Applied to ten years of daily equity data, the framework achieved high distributional pass rates both in-sample and out-of-sample while partially reproducing the volatility clustering that standard regime-switching models miss. No single model was best at everything: GARCH(1,1) better reproduced volatility clustering but failed distributional tests, while the standard HMM without jumps passed more distributional tests but could not generate volatility clustering. The proposed framework delivered the most balanced performance overall. For multi-asset generation, copula-based dependence models that preserved each asset's marginal HMM distribution substantially outperformed a Single-Index Model factor baseline on both per-asset distributional accuracy and correlation reproduction.
翻译:生成保持真实市场数据统计特性的合成金融时间序列对于压力测试、风险模型验证和情景设计至关重要。现有方法难以同时复现重尾分布、可忽略的线性自相关和持续的波动率聚集。我们开发了一种混合隐马尔可夫框架,将超额增长率离散化为拉普拉斯分位数定义的状态,并通过泊松跳跃-持续时间机制增强状态切换,以强制实现真实的尾部状态驻留时间。参数通过直接转移计数估计,绕过了鲍姆-韦尔奇期望最大化算法,并可扩展到包含424个资产的流水线。将该框架应用于十年日度股票数据后,其在样本内和样本外均实现了较高的分布通过率,同时部分复现了标准状态切换模型无法捕捉的波动率聚集。没有单一模型在所有方面表现最佳:GARCH(1,1)能更好复现波动率聚集但未能通过分布检验,而无跳跃的标准隐马尔可夫模型虽通过更多分布检验却无法生成波动率聚集。提出的框架整体上实现了最均衡的性能。对于多资产生成,保留各资产边际隐马尔可夫分布的基于连接函数的相关性模型在单资产分布精度和相关性复现方面均显著优于单指数模型因子基准。