As service systems grow increasingly complex and dynamic, many interventions become localized, available and taking effect only in specific states. This paper investigates experiments with local treatments on a widely-used class of dynamic models, Markov Decision Processes (MDPs). Particularly, we focus on utilizing the local structure to improve the inference efficiency of the average treatment effect. We begin by demonstrating the efficiency of classical inference methods, including model-based estimation and temporal difference learning under a fixed policy, as well as classical A/B testing with general treatments. We then introduce a variance reduction technique that exploits the local treatment structure by sharing information for states unaffected by the treatment policy. Our new estimator effectively overcomes the variance lower bound for general treatments while matching the more stringent lower bound incorporating the local treatment structure. Furthermore, our estimator can optimally achieve a linear reduction with the number of test arms for a major part of the variance. Finally, we explore scenarios with perfect knowledge of the control arm and design estimators that further improve inference efficiency.
翻译:随着服务系统日益复杂和动态化,许多干预措施变得局部化,仅在特定状态下可用并生效。本文研究了一类广泛使用的动态模型——马尔可夫决策过程(MDPs)上的局部处理实验。特别地,我们专注于利用局部结构来提高平均处理效应的推断效率。我们首先展示了经典推断方法的效率,包括固定策略下的基于模型估计和时间差分学习,以及具有一般处理的经典A/B测试。随后,我们引入一种方差缩减技术,该技术通过共享未受处理策略影响的状态信息来利用局部处理结构。我们的新估计量有效克服了一般处理的方差下界,同时匹配了包含局部处理结构的更严格下界。此外,对于方差的主要部分,我们的估计量能够以测试臂的数量实现最优的线性缩减。最后,我们探讨了控制臂信息完全已知的场景,并设计了能进一步提升推断效率的估计量。