Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.
翻译:合成时间序列通常用于实际应用中,以扩充历史时间序列数据集,从而提升机器学习算法性能、放大罕见事件的发生,并创建由时间序列描述的反事实场景。在反事实时间序列场景生成请求中,分布相似性(我们称之为真实性)以及满足特定数值约束是常见要求。例如,美国联邦储备委员会发布由约束时间序列给出的合成市场压力情景,供金融机构评估其在假设性衰退中的表现。现有的约束时间序列生成方法通常通过惩罚训练损失来强制执行约束,并拒绝不符合要求的样本。然而,这些方法在变更约束时需要重新训练,而拒绝采样可能计算成本高昂,或对复杂约束不切实际。在本文中,我们提出一组新颖方法来解决约束时间序列生成问题,在确保生成时间序列真实性的同时,提供高效采样。具体而言,我们将此问题框架化为约束优化问题,然后提出一组生成方法,包括“GuidedDiffTime”,一种引导扩散模型,用于生成真实的时间序列。通过实验,我们在多个财务和能源数据集上评估了我们的工作,在这些数据中约束整合至关重要。我们展示了我们的方法在定性和定量上均优于现有工作。最重要的是,我们证明“GuidedDiffTime”模型是唯一无需为新约束重新训练的解决方案,从而显著减少碳足迹。