Delayed outcomes are ubiquitous in online experimentation: treatment can affect whether an outcome occurs, when it occurs, and its realized value. To accommodate staggered entry while remaining robust to environmental nonstationarity and unit-level heterogeneity, we adopt a design-based perspective and target the sample cumulative reward in each arm as a function of calendar time. Our confidence sequences allow practitioners to continuously monitor the counterfactual incremental reward, such as revenue, that would have been realized by calendar time $t$ had all entered units been assigned to treatment rather than control. The main technical challenge is the choice of design-based filtration, complicated by the presence of asynchronous potential outcome times. We show that the IPW treatment-effect estimation error is not a martingale with respect to any filtration, while each arm-specific IPW estimation error is a martingale with respect to a carefully chosen arm-specific event-time filtration. We therefore construct a confidence sequence for the treatment effect by combining two arm-level confidence sequences with a union bound, and further demonstrate that this can outperform the traditional design-based variance upper bound. Finally, we characterize the class of augmentations for which the per-arm AIPW estimation error remains a martingale.
翻译:延迟结果在在线实验中普遍存在:处理可能影响结果是否发生、发生的时间及其实现值。为适应交错进入同时保持对环境非平稳性和单位异质性的鲁棒性,我们采用基于设计的视角,将各臂的样本累积奖励作为日历时间的函数。我们的置信序列使从业者能够连续监测反事实增量奖励(例如收入),即在日历时间 $t$ 时,若所有已进入单位被分配至处理组而非对照组时将实现的奖励。主要技术挑战在于基于设计的过滤选择,由于存在异步潜在结果时间而变得复杂。我们证明,IPW处理效应估计误差相对于任何过滤均不是鞅,而各臂特定的IPW估计误差相对于精心选择的各臂特定事件时间过滤是鞅。因此,我们通过结合两个臂级置信序列与联合界,构建了处理效应的置信序列,并进一步证明这可以优于传统的基于设计方差上界。最后,我们描述了使每臂AIPW估计误差保持为鞅的增强类。