We study the data-driven selection of causal graphical models using constraint-based algorithms, which determine the existence or non-existence of edges (causal connections) in a graph based on testing a series of conditional independence hypotheses. In settings where the ultimate scientific goal is to use the selected graph to inform estimation of some causal effect of interest (e.g., by selecting a valid and sufficient set of adjustment variables), we argue that a "cautious" approach to graph selection should control the probability of falsely removing edges and prefer dense, rather than sparse, graphs. We propose a simple inversion of the usual conditional independence testing procedure: to remove an edge, test the null hypothesis of conditional association greater than some user-specified threshold, rather than the null of independence. This equivalence testing formulation to testing independence constraints leads to a procedure with desriable statistical properties and behaviors that better match the inferential goals of certain scientific studies, for example observational epidemiological studies that aim to estimate causal effects in the face of causal model uncertainty. We illustrate our approach on a data example from environmental epidemiology.
翻译:我们研究使用基于约束的算法进行因果图模型的数据驱动选择,该算法通过检验一系列条件独立性假设来确定图中边(因果连接)的存在与否。在最终科学目标旨在利用所选图为感兴趣的因果效应估计提供信息(例如,通过选择一组有效且充分的调整变量)的场景下,我们认为,在图选择中应采取“谨慎”方法,即应控制错误移除边的概率,并倾向于选择稠密图而非稀疏图。我们提出对通常的条件独立性检验程序进行简单反转:要移除一条边,需检验零假设为条件关联大于某个用户指定阈值,而非独立性零假设。这种将等价性检验公式应用于独立性约束检验的方法,产生了一种具有理想统计性质且行为更符合某些科学研究推断目标的程序,例如在因果模型不确定背景下旨在估计因果效应的观察性流行病学研究。我们通过环境流行病学的一个数据示例阐述了该方法。