This work presents Causal Drift Generator (CaDrift), a time-dependent synthetic data generator framework based on Structural Causal Models (SCMs). The framework produces a virtually infinite combination of data streams with controlled shift events and time-dependent data, making it a tool to evaluate methods under evolving data. CaDrift synthesizes various distributional and covariate shifts by drifting mapping functions of the SCM, which change underlying cause-and-effect relationships between features and the target. In addition, CaDrift models occasional perturbations by leveraging interventions in causal modeling. Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts. The framework has been made available on GitHub.
翻译:本文提出因果漂移生成器(CaDrift),这是一种基于结构因果模型(SCMs)的时间依赖性合成数据生成框架。该框架能够生成具有受控漂移事件和时间依赖性数据的近乎无限组合的数据流,使其成为评估动态数据环境下方法的有效工具。CaDrift通过漂移SCM的映射函数来合成多种分布漂移和协变量漂移,从而改变特征与目标之间潜在的因果关系。此外,CaDrift利用因果建模中的干预机制来模拟偶发性扰动。实验结果表明,在发生分布漂移事件后,分类器的准确率通常会下降,随后逐渐恢复,这验证了该生成器在模拟漂移现象方面的有效性。该框架已在GitHub平台开源。