Time-series causal discovery (TSCD) is a fundamental problem of machine learning. However, existing synthetic datasets cannot properly evaluate or predict the algorithms' performance on real data. This study introduces the CausalTime pipeline to generate time-series that highly resemble the real data and with ground truth causal graphs for quantitative performance evaluation. The pipeline starts from real observations in a specific scenario and produces a matching benchmark dataset. Firstly, we harness deep neural networks along with normalizing flow to accurately capture realistic dynamics. Secondly, we extract hypothesized causal graphs by performing importance analysis on the neural network or leveraging prior knowledge. Thirdly, we derive the ground truth causal graphs by splitting the causal model into causal term, residual term, and noise term. Lastly, using the fitted network and the derived causal graph, we generate corresponding versatile time-series proper for algorithm assessment. In the experiments, we validate the fidelity of the generated data through qualitative and quantitative experiments, followed by a benchmarking of existing TSCD algorithms using these generated datasets. CausalTime offers a feasible solution to evaluating TSCD algorithms in real applications and can be generalized to a wide range of fields. For easy use of the proposed approach, we also provide a user-friendly website, hosted on www.causaltime.cc.
翻译:时序因果发现(TSCD)是机器学习领域的核心问题。然而,现有合成数据集无法有效评估或预测算法在真实数据上的表现。本研究提出CausalTime流水线,用于生成与真实数据高度相似且带有基准因果图的时序数据,以实现量化性能评估。该流水线从特定场景的实测数据出发,构建匹配的基准数据集。首先,我们利用深度神经网络结合归一化流精确捕捉真实动态特性;其次,通过对神经网络进行重要性分析或借助先验知识提取假设因果图;再次,通过将因果模型分解为因果项、残差项和噪声项推导出基准因果图;最后,基于拟合网络与推导的因果图,生成适用于算法评估的多功能时序数据。在实验中,我们通过定性与定量实验验证了生成数据的保真度,并利用这些数据集对现有TSCD算法进行基准测试。CausalTime为在真实应用中评估TSCD算法提供了可行方案,并可推广至广泛领域。为便于使用,我们还提供了用户友好型网站(www.causaltime.cc)。