This paper studies the challenging problem of estimating causal effects from observational data, in the presence of unobserved confounders. The two-stage least square (TSLS) method and its variants with a standard instrumental variable (IV) are commonly used to eliminate confounding bias, including the bias caused by unobserved confounders, but they rely on the linearity assumption. Besides, the strict condition of unconfounded instruments posed on a standard IV is too strong to be practical. To address these challenging and practical problems of the standard IV method (linearity assumption and the strict condition), in this paper, we use a conditional IV (CIV) to relax the unconfounded instrument condition of standard IV and propose a non-linear CIV regression with Confounding Balancing Representation Learning, CBRL.CIV, for jointly eliminating the confounding bias from unobserved confounders and balancing the observed confounders, without the linearity assumption. We theoretically demonstrate the soundness of CBRL.CIV. Extensive experiments on synthetic and two real-world datasets show the competitive performance of CBRL.CIV against state-of-the-art IV-based estimators and superiority in dealing with the non-linear situation.
翻译:本文研究了在存在未观测混杂因素的情况下,利用观测数据估计因果效应这一具有挑战性的问题。两阶段最小二乘法(TSLS)及其使用标准工具变量(IV)的变体常用于消除混杂偏差(包括由未观测混杂因素引起的偏差),但它们依赖于线性假设。此外,标准IV对无混杂工具变量的严格条件过于苛刻,难以实际应用。针对标准IV方法的这些挑战性和实际问题(线性假设与严格条件),本文采用条件工具变量(CIV)来放宽标准IV的无混杂工具变量条件,并提出了一种带有混杂平衡表示学习的非线性CIV回归方法CBRL.CIV,旨在联合消除未观测混杂因素带来的偏差并平衡已观测混杂因素,且无需线性假设。我们从理论上证明了CBRL.CIV的合理性。在合成数据集和两个真实世界数据集上的大量实验表明,与基于IV的最先进估计器相比,CBRL.CIV具有竞争力的表现,并在处理非线性场景时展现出优越性。