Single Proxy Control - 专知论文

Negative control variables are sometimes used in non-experimental studies to detect the presence of confounding by hidden factors. An outcome is said to be a valid negative control outcome (NCO) or more broadly, an outcome that is a proxy for confounding to the extent that it is influenced by unobserved confounders of the exposure effects on the outcome in view, although not causally impacted by the exposure. Tchetgen Tchetgen (2013) introduced the control outcome calibration approach (COCA), as a formal NCO counterfactual method to detect and correct for residual confounding bias. For identification, COCA treats the NCO as an error-prone proxy of the treatment-free counterfactual outcome of interest, and involves regressing the NCO, on the treatment-free counterfactual, together with a rank-preserving structural model which assumes a constant individual-level causal effect. In this work, we establish nonparametric COCA identification for the average causal effect for the treated, without requiring rank-preservation, therefore accommodating unrestricted effect heterogeneity across units. This nonparametric identification result has important practical implications, as it provides single proxy confounding control, in contrast to recently proposed proximal causal inference, which relies for identification on a pair of confounding proxies. For COCA estimation we propose three separate strategies: (i) an extended propensity score approach, (ii) an outcome bridge function approach, and (iii) a doubly robust approach which is unbiased if either (i) or (ii) is unbiased. Finally, we illustrate the proposed methods in an application evaluating the causal impact of a Zika virus outbreak on birth rate in Brazil.

翻译：非实验研究中有时会使用阴性对照变量来检测隐藏因素导致的混杂存在。当某个结局虽不受暴露因素因果影响，但能反映暴露效应中未观测混杂因素的作用程度时，该结局被称为有效阴性对照结局（NCO），或更广义地称为混杂代理变量。Tchetgen Tchetgen（2013）提出了对照结局校准方法（COCA），作为一种正式的NCO反事实方法，用于检测并校正残余混杂偏倚。在识别方面，COCA将NCO视为无处理反事实结局的误差代理变量，通过将NCO对无处理反事实进行回归，并引入假设个体层面因果效应恒定的保秩结构模型实现识别。本研究在不要求保秩假设的条件下，建立了处理组平均因果效应的非参数COCA识别框架，从而允许不同单元间存在无限制效应异质性。这一非参数识别结果具有重要实践意义：与近期提出的依赖一对混杂代理进行识别的邻近因果推断不同，本研究实现了单一代理对混杂的控制。针对COCA估计，我们提出三种独立策略：（i）扩展倾向性得分方法，（ii）结局桥函数方法，以及（iii）当（i）或（ii）无偏时仍保持无偏的双重稳健方法。最后，通过评估巴西寨卡病毒爆发对出生率的因果影响案例，对所提方法进行实证分析。