Generating realistic synthetic sequential data is critical in real-world applications across operations research, finance, healthcare, energy systems, and scientific computing, where time-indexed observations are used for prediction, simulation, risk assessment, and data-driven decision-making. While diffusion models have achieved remarkable success in generating static data, their direct extensions to sequential settings often fail to capture temporal dependence and information structure. Designing diffusion models that can simulate sequential data in an adapted manner, and hence without anticipation of future information, therefore remains an open challenge. In this work, we propose a sequential forward-backward diffusion framework for adapted time series generation. Our approach progressively injects and removes noise along the sequence, conditioning on the previously generated history to ensure adaptiveness. A novel score-matching objective is introduced for efficient parallel training. We derive rigorous statistical guarantees under a generic framework, then establish score approximation, score estimation, and distribution estimation results with ReLU networks serving as a concrete instance. Empirically, we validate our method on synthetic data, including ARMA models and Gaussian processes, and demonstrate its effectiveness in constructing mean-variance optimal portfolios.
翻译:生成逼真的合成序列数据对于运筹学、金融、医疗、能源系统和科学计算等实际应用至关重要,在这些领域中,时间索引观测值用于预测、模拟、风险评估和数据驱动决策。虽然扩散模型在生成静态数据方面取得了显著成功,但将其直接扩展到序列场景往往无法捕捉时序依赖性和信息结构。设计能够以自适应方式模拟序列数据(即不预见未来信息)的扩散模型,仍然是一个开放挑战。在本工作中,我们提出了一种用于自适应时间序列生成的序列前向-后向扩散框架。我们的方法沿序列逐步注入和移除噪声,并以先前生成的历史数据为条件,确保自适应性。我们引入了一种新颖的分数匹配目标,以实现高效的并行训练。我们在通用框架下推导了严格的统计保证,然后以ReLU网络作为具体实例,建立了分数逼近、分数估计和分布估计的结果。在实证方面,我们在包括ARMA模型和高斯过程在内的合成数据上验证了我们的方法,并展示了其在构建均值-方差最优投资组合方面的有效性。