We study the data-driven selection of causal graphical models using constraint-based algorithms, which determine the existence or non-existence of edges (causal connections) in a graph based on testing a series of conditional independence hypotheses. In settings where the ultimate scientific goal is to use the selected graph to inform estimation of some causal effect of interest (e.g., by selecting a valid and sufficient set of adjustment variables), we argue that a "cautious" approach to graph selection should control the probability of falsely removing edges and prefer dense, rather than sparse, graphs. We propose a simple inversion of the usual conditional independence testing procedure: to remove an edge, test the null hypothesis of conditional association greater than some user-specified threshold, rather than the null of independence. This equivalence testing formulation to testing independence constraints leads to a procedure with desriable statistical properties and behaviors that better match the inferential goals of certain scientific studies, for example observational epidemiological studies that aim to estimate causal effects in the face of causal model uncertainty. We illustrate our approach on a data example from environmental epidemiology.
翻译:我们研究了使用基于约束的算法进行因果图模型的数据驱动选择,这些算法通过测试一系列条件独立假设来确定图中边(因果关系)的存在与否。在最终科学目标是利用所选图来指导感兴趣的某种因果效应估计(例如,通过选择有效且充分的调整变量集)的场景中,我们认为图选择的“谨慎”方法应控制错误移除边的概率,并偏好稠密图而非稀疏图。我们提出对通常的条件独立检验程序进行简单反转:若要移除一条边,则检验条件关联大于某个用户指定阈值的零假设,而非检验独立性的零假设。这种对独立性约束进行等价性检验的公式化方法产生了一个具有理想统计特性和行为的流程,能更好地匹配某些科学研究的推断目标,例如旨在面对因果模型不确定性时估计因果效应的观察性流行病学研究。我们通过环境流行病学的一个数据实例展示了我们的方法。