Causal datasets play a critical role in advancing the field of causality. However, existing datasets often lack the complexity of real-world issues such as selection bias, unfaithful data, and confounding. To address this gap, we propose a new synthetic causal dataset, the Structurally Complex with Additive paRent causalitY (SCARY) dataset, which includes the following features. The dataset comprises 40 scenarios, each generated with three different seeds, allowing researchers to leverage relevant subsets of the dataset. Additionally, we use two different data generation mechanisms for generating the causal relationship between parents and child nodes, including linear and mixed causal mechanisms with multiple sub-types. Our dataset generator is inspired by the Causal Discovery Toolbox and generates only additive models. The dataset has a Varsortability of 0.5. Our SCARY dataset provides a valuable resource for researchers to explore causal discovery under more realistic scenarios. The dataset is available at https://github.com/JayJayc/SCARY.
翻译:因果数据集在推动因果关系研究领域发展方面具有关键作用。然而,现有数据集往往缺乏现实世界问题的复杂性,例如选择偏差、数据非忠实性和混杂因素。为弥补这一不足,我们提出一种新型合成因果数据集——结构复杂且含加性父因果关系(SCARY)数据集,该数据集包含以下特征。数据集包含40个场景,每个场景使用三种不同随机种子生成,使研究人员能够利用数据集的相应子集。此外,我们采用两种不同的数据生成机制来建立父节点与子节点之间的因果关系,包括线性和混合因果机制(含多种子类型)。我们的数据集生成器受因果发现工具箱启发,仅生成加性模型。该数据集的变量可排序性(Varsortability)值为0.5。SCARY数据集为研究人员在更逼真的场景下探索因果发现提供了宝贵资源。数据集可通过https://github.com/JayJayc/SCARY获取。