Improving instrumental variable estimators with post-stratification

An instrumental variable (IV) is a device that encourages units in a study to be exposed to a treatment. Under a set of key assumptions, a valid instrument allows for consistent estimation of treatment effects for compliers (those who are only exposed to treatment when encouraged to do so) even in the presence of unobserved confounders. Unfortunately, popular IV estimators can be unstable in studies with a small fraction of compliers. Here, we explore post-stratifying the data using variables that predict complier status (and, potentially, the outcome) to yield better estimation and inferential properties. We outline an estimator that is a weighted average of IV estimates within each stratum, weighing the stratum estimates by their estimated proportion of compliers. We then explore the benefits of post-stratification in terms of bias reduction, variance reduction, and improved standard error estimates, providing derivations that identify the direction of bias as a function of the relative means of the compliers and non-compliers. We also provide a finite-sample asymptotic formula for the variance of the post-stratified estimators. We demonstrate the relative performances of different IV approaches in simulations studies and discuss the advantages of our design-based post-stratification approach over incorporating compliance-predictive covariates into two-stage least squares regressions. In the end, we show covariates predictive of outcome can increase precision, but only if one is willing to make a bias-variance trade-off by down-weighting or dropping those strata with few compliers. Our methods are further exemplified in an application.

翻译：工具变量（IV）是一种鼓励研究中的个体接受某种处理的手段。在一组关键假设下，即使存在未观测到的混杂因素，有效的工具变量也能对"依从者"（即只有在被鼓励时才会接受处理的人群）的处理效应进行一致估计。然而，当研究中依从者比例较小时，常用的工具变量估计量可能不稳定。本文探讨利用预测依从者状态（以及潜在结局）的变量对数据进行后分层，从而获得更优的估计与推断性质。我们提出一种估计量，它是对各层内工具变量估计值的加权平均，且各层估计值按其估计的依从者比例进行加权。随后，我们从偏倚降低、方差减小以及标准误估计改进三个方面，探究后分层方法的优势，并推导出偏倚方向与依从者及非依从者相对均值之间的函数关系。我们还给出了后分层估计量方差的有限样本渐近公式。通过模拟研究，我们展示了不同工具变量方法的相对表现，并讨论了基于设计的后分层方法相较于将预测依从性的协变量纳入两阶段最小二乘回归的优势。最后，我们证明预测结局的协变量能够提高估计精度，但前提是愿意通过降低或剔除依从者较少的层来权衡偏倚与方差。我们的方法也在实际应用中得到了进一步验证。