Modern treatment targeting methods often rely on estimating a conditional average treatment effect (CATE) using machine learning tools. While effective in identifying who benefits from treatment on the individual level, these approaches typically overlook system-level dynamics that may arise when treatments induce strain on shared capacity. We study the problem of targeting in Markovian systems, where treatment decisions must be made one at a time as units arrive, and early decisions can impact later outcomes through delayed or limited access to resources. We show that optimal policies in such settings compare CATE-like quantities to state-specific thresholds, where each threshold reflects the expected cumulative impact on the system of treating an additional individual in the given state. We propose an algorithm that augments standard CATE estimation with state-level value iteration to estimate these thresholds from observational data. Theoretical results establish consistency and convergence guarantees, and empirical studies demonstrate that our method improves long-run outcomes considerably relative to individual-level CATE targeting rules and generic offline reinforcement learning algorithms.
翻译:现代治疗靶向方法通常依赖机器学习工具估计条件平均处理效应(CATE)。虽然这类方法在识别个体层面受益于治疗的对象方面表现有效,但往往忽略了因治疗对共享容量施加压力而产生的系统级动态变化。我们研究了马尔可夫系统中的靶向问题:当治疗决策需按单元抵达顺序逐一做出时,早期决策可能通过资源延迟或有限获取影响后续结果。研究表明,此类场景下的最优策略需要将类CATE指标与状态特定阈值进行比较,其中每个阈值反映了在给定状态下对一个额外个体实施治疗所产生的预期累积系统影响。我们提出一种算法,将标准CATE估计与状态级值迭代相结合,从观测数据中估计这些阈值。理论结果确立了估计的一致性与收敛性保证,实验证明相较个体级CATE靶向规则和通用离线强化学习算法,该方法能显著改善长期结果。