The Structurally Complex with Additive Parent Causality (SCARY) Dataset

Causal datasets play a critical role in advancing the field of causality. However, existing datasets often lack the complexity of real-world issues such as selection bias, unfaithful data, and confounding. To address this gap, we propose a new synthetic causal dataset, the Structurally Complex with Additive paRent causalitY (SCARY) dataset, which includes the following features. The dataset comprises 40 scenarios, each generated with three different seeds, allowing researchers to leverage relevant subsets of the dataset. Additionally, we use two different data generation mechanisms for generating the causal relationship between parents and child nodes, including linear and mixed causal mechanisms with multiple sub-types. Our dataset generator is inspired by the Causal Discovery Toolbox and generates only additive models. The dataset has a Varsortability of 0.5. Our SCARY dataset provides a valuable resource for researchers to explore causal discovery under more realistic scenarios. The dataset is available at https://github.com/JayJayc/SCARY.

翻译：因果数据集在推动因果关系研究领域发展方面具有关键作用。然而，现有数据集往往缺乏现实世界问题的复杂性，例如选择偏差、数据非忠实性和混杂因素。为弥补这一不足，我们提出一种新型合成因果数据集——结构复杂且含加性父因果关系（SCARY）数据集，该数据集包含以下特征。数据集包含40个场景，每个场景使用三种不同随机种子生成，使研究人员能够利用数据集的相应子集。此外，我们采用两种不同的数据生成机制来建立父节点与子节点之间的因果关系，包括线性和混合因果机制（含多种子类型）。我们的数据集生成器受因果发现工具箱启发，仅生成加性模型。该数据集的变量可排序性（Varsortability）值为0.5。SCARY数据集为研究人员在更逼真的场景下探索因果发现提供了宝贵资源。数据集可通过https://github.com/JayJayc/SCARY获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日