While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased policy evaluation and yield ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network (or partial) interference where clusters of units are sampled from a population and units may influence one another within each cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology.
翻译:尽管目前已有大量关于政策评估与学习的文献,但先前的大部分研究均假设一个单元的处理分配不会影响另一个单元的结果。遗憾的是,忽视干扰可能导致有偏的政策评估并产生无效的学习策略。例如,对拥有众多朋友的有影响力个体进行处理,可以产生正向溢出效应,从而提升个性化处理规则(ITR)的整体表现。本文研究了在集群网络(或局部)干扰下评估和学习最优ITR的问题,其中从总体中抽取单元集群,且单元间可能在同一集群内相互影响。在该模型下,我们提出了一种可用于评估ITR经验性能的估计量。研究表明,该估计量在效率上显著优于不施加任何溢出效应假设的标准逆概率加权估计量。我们推导了学习得到的ITR的有限样本遗憾界,证明使用本文提出的高效评估估计量可提升学习策略的性能。最后,我们通过模拟与实证研究展示了所提方法的优势。