The number of artificial intelligence algorithms for learning causal models from data is growing rapidly. Most ``causal discovery'' or ``causal structure learning'' algorithms are primarily validated through simulation studies. However, no widely accepted simulation standards exist and publications often report conflicting performance statistics -- even when only considering publications that simulate data from linear models. In response, several manuscripts have criticized a popular simulation design for validating algorithms in the linear case. We propose a new simulation design for generating linear models for directed acyclic graphs (DAGs): the DAG-adaptation of the Onion (DaO) method. DaO simulations are fundamentally different from existing simulations because they prioritize the distribution of correlation matrices rather than the distribution of linear effects. Specifically, the DaO method uniformly samples the space of all correlation matrices consistent with (i.e. Markov to) a DAG. We also discuss how to sample DAGs and present methods for generating DAGs with scale-free in-degree or out-degree. We compare the DaO method against two alternative simulation designs and provide implementations of the DaO method in Python and R: https://github.com/bja43/DaO_simulation. We advocate for others to adopt DaO simulations as a fair universal benchmark.
翻译:从数据中学习因果模型的人工智能算法数量正在迅速增长。大多数"因果发现"或"因果结构学习"算法主要通过仿真研究进行验证。然而,目前尚无广泛接受的仿真标准,且不同文献常报告相互矛盾的性能统计结果——即使仅考虑基于线性模型生成仿真数据的文献。针对此问题,已有若干研究对线性场景下算法验证中广泛使用的仿真设计提出了批评。本文提出一种为有向无环图生成线性模型的新型仿真设计:DAG自适应洋葱方法。DaO仿真与现有仿真存在本质区别,其优先考虑相关矩阵的分布而非线性效应的分布。具体而言,DaO方法均匀采样所有与DAG相容(即满足马尔可夫性)的相关矩阵空间。本文同时探讨了DAG的采样方法,并提出了生成具有无标度入度或出度的DAG的技术方案。通过将DaO方法与两种替代仿真设计进行比较,我们在Python和R语言中提供了DaO方法的实现:https://github.com/bja43/DaO_simulation。我们建议学术界采用DaO仿真作为公平的通用基准测试框架。