In online experiments where the intervention is only exposed, or "triggered", for a small subset of the population, it is critical to use variance reduction techniques to estimate treatment effects with sufficient precision to inform business decisions. Trigger-dilute analysis is often used in these situations, and reduce the sampling variance of overall intent-to-treat (ITT) effects by an order of magnitude equal to the inverse of the triggering rate; for example, a triggering rate of $5\%$ corresponds to roughly a $20x$ reduction in variance. To apply trigger-dilute analysis, one needs to know experimental subjects' triggering counterfactual statuses, i.e., the counterfactual behavior of subjects under both treatment and control conditions. In this paper, we propose an unbiased ITT estimator with reduced variance applicable for experiments where the triggering counterfactual status is only observed in the treatment group. Our method is based on the efficiency augmentation idea of CUPED and draws upon identification frameworks from the principal stratification and instrumental variables literature. The unbiasedness of our estimation approach relies on a testable assumption that the augmentation term used for covariate adjustment equals zero in expectation. Unlike traditional covariate adjustment or principal score modeling approaches, our estimator can incorporate both pre-experiment and in-experiment observations. We demonstrate through both a real-world experiment and simulations that our estimator can remain unbiased and achieve precision improvements as large as if triggering status were fully observed, and in some cases can even outperform trigger-dilute analysis.
翻译:在干预仅作用于(即“触发”)总体中一小部分样本的在线实验中,必须使用方差缩减技术以足够精度估计处理效应,从而为业务决策提供依据。触发稀释分析常用于此类场景,可将整体意向治疗(ITT)效应的抽样方差降低与触发率倒数量级相当的倍数——例如,\(5\%\)的触发率对应约20倍的方差缩减。应用触发稀释分析需获知实验对象的反事实触发状态,即对象在处理组与对照组下的反事实行为。本文针对仅在处理组可观测反事实触发状态的实验场景,提出一种方差缩减的无偏ITT估计量。该方法基于CUPED的效率增强思想,借鉴主分层与工具变量文献中的识别框架。我们估计方法的无偏性依赖于一个可检验假设:用于协变量调整的增强项期望为零。与传统的协变量调整或主评分建模方法不同,我们的估计量可同时纳入实验前与实验内观测数据。通过真实实验与模拟,我们证明该估计量能保持无偏性,并实现与完全观测触发状态时相当的精度提升;在某些情况下,其表现甚至优于触发稀释分析。