Mean-based estimators of the causal effect in a completely randomized experiment (e.g., the difference-in-means estimator) may behave poorly if the potential outcomes have a heavy-tail, or contain outliers. We study an alternative estimator by Rosenbaum that estimates the constant additive treatment effect by inverting a randomization test using ranks. By investigating the breakdown point and asymptotic relative efficiency of this rank-based estimator, we show that it is provably robust against heavy-tailed potential outcomes, and has variance that is asymptotically, in the worst case, at most about 1.16 times that of the difference-in-means estimator; and its variance can be much smaller when the potential outcomes are not light-tailed. We further derive a consistent estimator of the asymptotic standard error for Rosenbaum's estimator which yields a readily computable confidence interval for the treatment effect. Further, we study a regression adjusted version of Rosenbaum's estimator to incorporate additional covariate information in randomization inference. We prove gain in efficiency by this regression adjustment method under a linear regression model. We illustrate through synthetic and real data that, unlike the mean-based estimators, these rank-based estimators (both unadjusted or regression adjusted) are efficient and robust against heavy-tailed distributions, contamination, and model misspecification.
翻译:在完全随机实验中,基于均值的因果效应估计量(如均值差估计量)在潜在结果具有重尾分布或包含异常值时可能表现不佳。我们研究了罗森鲍姆提出的另一种估计量,该估计量通过反转基于秩的随机化检验来估计常数加性处理效应。通过考察该基于秩的估计量的崩溃点与渐近相对效率,我们证明其能可靠地抵抗重尾潜在结果,且其方差在最坏情况下渐近地不超过均值差估计量方差的约1.16倍;而当潜在结果并非轻尾时,其方差可能远小于后者。我们进一步推导了罗森鲍姆估计量的渐近标准误的一致估计量,从而可便捷地计算处理效应的置信区间。此外,我们研究了罗森鲍姆估计量的回归调整版本,以在随机化推断中纳入协变量信息。在线性回归模型下,我们证明了该回归调整方法能提升效率。通过合成数据与真实数据的实验表明,与基于均值的估计量不同,这些基于秩的估计量(无论是否经回归调整)在面对重尾分布、污染及模型误设时均具有高效性与鲁棒性。