This study demonstrates the existence of a testable condition for the identification of the causal effect of a treatment on an outcome in observational data, which relies on two sets of variables: observed covariates to be controlled for and a suspected instrument. Under a causal structure commonly found in empirical applications, the testable conditional independence of the suspected instrument and the outcome given the treatment and the covariates has two implications. First, the instrument is valid, i.e. it does not directly affect the outcome (other than through the treatment) and is unconfounded conditional on the covariates. Second, the treatment is unconfounded conditional on the covariates such that the treatment effect is identified. We suggest tests of this conditional independence based on machine learning methods that account for covariates in a data-driven way and investigate their asymptotic behavior and finite sample performance in a simulation study. We also apply our testing approach to evaluating the impact of fertility on female labor supply when using the sibling sex ratio of the first two children as supposed instrument, which by and large points to a violation of our testable implication for the moderate set of socio-economic covariates considered.
翻译:本研究证明,在观察性数据中识别处理对结果因果效应存在一个可检验条件,该条件依赖于两组变量:需控制的观测协变量与可疑工具变量。在实证应用中常见的因果结构下,可疑工具变量与结果在给定处理变量和协变量条件下的条件独立性具有两层含义。其一,工具变量有效,即除通过处理变量外不直接影响结果,且在协变量条件下无混杂。其二,处理变量在协变量条件下无混杂,使得处理效应可识别。我们提出基于机器学习方法的条件独立性检验,以数据驱动方式处理协变量,并通过模拟研究考察其渐近行为与有限样本表现。同时,我们将该检验方法应用于评估生育对女性劳动力供给的影响,使用前两胎子女的兄弟姐妹性别比作为假定工具变量,结果总体表明,在考虑中等规模社会经济协变量时,我们的可检验含义遭到违背。