Randomization Tests in Switchback Experiments

Switchback experiments--alternating treatment and control over time--are widely used when unit-level randomization is infeasible, outcomes are aggregated, or user interference is unavoidable. In practice, experimentation must support fast product cycles, so teams often run studies for limited durations and make decisions with modest samples. At the same time, outcomes in these time-indexed settings exhibit serial dependence, seasonality, and occasional heavy-tailed shocks, and temporal interference (carryover or anticipation) can render standard asymptotics and naive randomization tests unreliable. In this paper, we develop a randomization-test framework that delivers finite-sample valid, distribution-free p-values for several null hypotheses of interest using only the known assignment mechanism, without parametric assumptions on the outcome process. For causal effects of interests, we impose two primitive conditions--non-anticipation and a finite carryover horizon m--and construct conditional randomization tests (CRTs) based on an ex ante pooling of design blocks into "sections," which yields a tractable conditional assignment law and ensures imputability of focal outcomes. We provide diagnostics for learning the carryover window and assessing non-anticipation, and we introduce studentized CRTs for a session-wise weak null that accommodates within-session seasonality with asymptotic validity. Power approximations under distributed-lag effects with AR(1) noise guide design and analysis choices, and simulations demonstrate favorable size and power relative to common alternatives. Our framework extends naturally to other time-indexed designs.

翻译：切换实验——随时间交替施加处理和对照——在单元级随机化不可行、结果被聚合或用户干扰不可避免时被广泛应用。实践中，实验必须支持快速的产品迭代周期，因此团队通常仅在有限时间内开展研究，并基于适度样本量做出决策。与此同时，这些时间索引场景中的结果表现出序列相关性、季节性以及偶发的重尾冲击，而时间性干扰（残留效应或预期效应）可能导致标准渐近理论和朴素随机化检验不可靠。本文提出一种随机化检验框架，该框架仅利用已知的分配机制，无需对结果过程进行参数假设，即可为多个关注的零假设提供有限样本有效且无分布依赖的p值。针对关注的因果效应，我们施加两个基本条件——非预期性和有限残留效应窗口m——并基于将设计块事前聚合为“区段”的方式构建条件随机化检验（CRT），这种方法产生了易于处理的条件分配律，并确保了焦点结果的可推算性。我们提供了用于学习残留效应窗口和评估非预期性的诊断方法，并引入了针对会话级弱零假设的学生化CRT，该检验能适应会话内季节性并保持渐近有效性。在AR(1)噪声下的分布滞后效应模型中进行功效近似，以指导设计和分析选择，仿真实验表明相较于常见替代方法，本框架在检验水平和功效方面具有优势。该框架可自然扩展至其他时间索引的实验设计。