Causal inference methods for observational data are highly regarded due to their wide applicability. While there are already numerous methods available for de-confounding bias, these methods generally assume that covariates consist solely of confounders or make naive assumptions about the covariates. Such assumptions face challenges in both theory and practice, particularly when dealing with high-dimensional covariates. Relaxing these naive assumptions and identifying the confounding covariates that truly require correction can effectively enhance the practical significance of these methods. Therefore, this paper proposes a General Causal Inference (GCI) framework specifically designed for cross-sectional observational data, which precisely identifies the key confounding covariates and provides corresponding identification algorithm. Specifically, based on progressive derivations of the Markov property on Directed Acyclic Graph, we conclude that the key confounding covariates are equivalent to the common root ancestors of the treatment and the outcome variable. Building upon this conclusion, the GCI framework is composed of a novel Ancestor Set Identification (ASI) algorithm and de-confounding inference methods. Firstly, the ASI algorithm is theoretically supported by the conditional independence properties and causal asymmetry between variables, enabling the identification of key confounding covariates. Subsequently, the identified confounding covariates are used in the de-confounding inference methods to obtain unbiased causal effect estimation, which can support informed decision-making. Extensive experiments on synthetic datasets demonstrate that the GCI framework can effectively identify the critical confounding covariates and significantly improve the precision, stability, and interpretability of causal inference in observational studies.
翻译:因果推断方法在处理观测数据时因广泛适用性而备受关注。尽管现有多种方法可消除混杂偏差,但这些方法通常假定协变量仅包含混杂因子,或对其作出朴素假设。此类假设在理论和实践中均面临挑战,尤其当处理高维协变量时。放松这些朴素假设,识别真正需要校正的混杂协变量,可有效提升方法的实际意义。为此,本文提出一种专门针对截面观测数据的通用因果推断(GCI)框架,该框架可精确识别关键混杂协变量并提供相应的识别算法。具体而言,基于有向无环图上马尔可夫性质的逐步推导,我们得出关键混杂协变量等价于处理变量与结果变量的共同根祖先的结论。基于此结论,GCI框架由新型祖先集识别(ASI)算法与去混杂推断方法组成。首先,ASI算法以变量间的条件独立性及因果非对称性为理论支撑,实现关键混杂协变量的识别;随后,利用所识别的混杂协变量通过去混杂推断方法获得无偏的因果效应估计,从而支持科学决策。在合成数据集上的大量实验表明,GCI框架可有效识别关键混杂协变量,显著提升观测研究中因果推断的精度、稳定性与可解释性。