Although there is now a large literature on policy evaluation and learning, much of the prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference can lead to biased policy evaluation and ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network interference (also known as partial interference), where clusters of units are sampled from a population and units may influence one another within each cluster. Unlike previous methods that impose strong restrictions on spillover effects, such as anonymous interference, the proposed methodology only assumes a semiparametric structural model, where each unit's outcome is an additive function of individual treatments within the cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. We consider both experimental and observational studies, and for the latter, we develop a doubly robust estimator that is semiparametrically efficient and yields an optimal regret bound. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology.
翻译:尽管目前已有大量关于策略评估与学习的文献,但先前工作大多假设一个单元的处理分配不会影响其他单元的结果。遗憾的是,忽略干扰效应可能导致策略评估产生偏差,并使习得的策略失效。例如,对拥有众多朋友的有影响力个体实施处理可能产生正向溢出效应,从而提升个体化处理规则(ITR)的整体表现。本文研究在聚类网络干扰(亦称部分干扰)下评估与学习最优ITR的问题,其中单元簇从总体中抽样获得,且同一簇内单元可能相互影响。与先前对溢出效应施加严格限制(如匿名干扰)的方法不同,本文提出的方法仅假设一个半参数结构模型,其中每个单元的结果是簇内个体处理的加性函数。在此模型下,我们提出一种可用于评估ITR经验性能的估计量。我们证明该估计量比标准逆概率加权估计量(不对溢出效应作任何假设)具有显著更高的效率。我们推导了习得ITR的有限样本遗憾界,表明使用我们提出的高效评估估计量能够提升习得策略的性能。我们同时考虑实验性与观察性研究,针对后者开发了双重稳健估计量,该估计量具有半参数效率并能产生最优遗憾界。最后,我们通过仿真与实证研究验证所提方法的优势。