We discuss causal inference for observational studies with possibly invalid instrumental variables. We propose a novel methodology called two-stage curvature identification (TSCI) by exploring the nonlinear treatment model with machine learning. {The first-stage machine learning enables improving the instrumental variable's strength and adjusting for different forms of violating the instrumental variable assumptions.} The success of TSCI requires the instrumental variable's effect on treatment to differ from its violation form. A novel bias correction step is implemented to remove bias resulting from the potentially high complexity of machine learning. Our proposed \texttt{TSCI} estimator is shown to be asymptotically unbiased and Gaussian even if the machine learning algorithm does not consistently estimate the treatment model. Furthermore, we design a data-dependent method to choose the best among several candidate violation forms. We apply TSCI to study the effect of education on earnings.
翻译:我们讨论在可能具有无效工具变量的观察性研究中的因果推断。通过利用机器学习探索非线性处理模型,我们提出了一种名为两阶段曲率识别(TSCI)的新方法。第一阶段的机器学习能够增强工具变量的强度,并调整不同形式的工具变量假设违背。TSCI的成功要求工具变量对处理的影响与其违背形式不同。我们实施了一种新颖的偏差校正步骤,以消除由机器学习潜在高复杂性引起的偏差。我们提出的TSCI估计量即使机器学习算法未能一致地估计处理模型,也被证明是渐近无偏且服从高斯分布的。此外,我们设计了一种数据依赖的方法,在若干候选违背形式中选择最佳者。我们将TSCI应用于研究教育对收入的影响。