Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environments, limiting applicability to nonlinear single-environment settings. To bridge this gap, we propose Kernel Regression Confounder Detection (KRCD), a novel method for detecting unobserved confounding in nonlinear observational data under single-environment conditions. KRCD leverages reproducing kernel Hilbert spaces to model complex dependencies. By comparing standard and higherorder kernel regressions, we derive a test statistic whose significant deviation from zero indicates unobserved confounding. Theoretically, we prove two key results: First, in infinite samples, regression coefficients coincide if and only if no unobserved confounders exist. Second, finite-sample differences converge to zero-mean Gaussian distributions with tractable variance. Extensive experiments on synthetic benchmarks and the Twins dataset demonstrate that KRCD not only outperforms existing baselines but also achieves superior computational efficiency.
翻译:检测未观测混杂因子对于观测性研究中可靠的因果推断至关重要。现有方法要么需要线性假设,要么需要多个异质环境,限制了其在非线性单环境场景下的适用性。为弥补这一空白,我们提出了核回归混杂因子检测(KRCD),这是一种在单环境条件下检测非线性观测数据中未观测混杂的新方法。KRCD利用再生核希尔伯特空间来建模复杂依赖关系。通过比较标准核回归与高阶核回归,我们推导出一个检验统计量,其显著偏离零值即表明存在未观测混杂。理论上,我们证明了两个关键结果:第一,在无限样本下,回归系数当且仅当不存在未观测混杂因子时才会一致。第二,有限样本差异收敛于具有可处理方差的零均值高斯分布。在合成基准数据集和Twins数据集上的大量实验表明,KRCD不仅优于现有基线方法,而且实现了卓越的计算效率。