Simulation studies play a key role in the validation of causal inference methods. The simulation results are reliable only if the study is designed according to the promised operational conditions of the method-in-test. Still, many causal inference literature tend to design over-restricted or misspecified studies. In this paper, we elaborate on the problem of improper simulation design for causal methods and compile a list of desiderata for an effective simulation framework. We then introduce partially-randomized causal simulation (PARCS), a simulation framework that meets those desiderata. PARCS synthesizes data based on graphical causal models and a wide range of adjustable parameters. There is a legible mapping from usual causal assumptions to the parameters, thus, users can identify and specify the subset of related parameters and randomize the remaining ones to generate a range of complying data-generating processes for their causal method. The result is a more comprehensive and inclusive empirical investigation for causal claims. Using PARCS, we reproduce and extend the simulation studies of two well-known causal discovery and missing data analysis papers to emphasize the necessity of a proper simulation design. Our results show that those papers would have improved and extended the findings, had they used PARCS for simulation. The framework is implemented as a Python package, too. By discussing the comprehensiveness and transparency of PARCS, we encourage causal inference researchers to utilize it as a standard tool for future works.
翻译:模拟研究在因果推断方法的验证中起着关键作用。只有当研究根据被测试方法所承诺的操作条件设计时,模拟结果才可靠。然而,许多因果推断文献倾向于设计过度受限或错误指定的研究。本文阐述了因果方法不当模拟设计的问题,并提出了有效模拟框架应满足的一系列期望条件。随后,我们介绍了部分随机化因果模拟(PARCS),一种满足这些期望条件的模拟框架。PARCS基于图形因果模型和广泛可调参数合成数据。从通常的因果假设到参数之间存在清晰的映射,因此用户可以识别并指定相关参数子集,同时随机化其余参数,为其因果方法生成一系列符合条件的数据生成过程。这为因果主张带来了更全面、更包容的实证研究。利用PARCS,我们复现并扩展了两篇著名因果发现和缺失数据分析论文的模拟研究,以强调适当模拟设计的必要性。结果表明,若这些论文采用PARCS进行模拟,其研究结论本可以得到改进和扩展。该框架也已实现为Python软件包。通过讨论PARCS的全面性和透明性,我们鼓励因果推断研究者将其作为未来研究的标准工具。