Decision makers often aim to learn a treatment assignment policy under a capacity constraint on the number of agents that they can treat. When agents can respond strategically to such policies, competition arises, complicating estimation of the optimal policy. In this paper, we study capacity-constrained treatment assignment in the presence of such interference. We consider a dynamic model where the decision maker allocates treatments at each time step and heterogeneous agents myopically best respond to the previous treatment assignment policy. When the number of agents is large but finite, we show that the threshold for receiving treatment under a given policy converges to the policy's mean-field equilibrium threshold. Based on this result, we develop a consistent estimator for the policy gradient. In a semi-synthetic experiment with data from the National Education Longitudinal Study of 1988, we demonstrate that this estimator can be used for learning capacity-constrained policies in the presence of strategic behavior.
翻译:决策者通常希望在可处理主体数量存在容量约束的情况下学习一种治疗分配策略。当主体能够对此类策略做出战略性响应时,便会产生竞争,从而使最优策略的估计变得复杂。本文研究了存在此类干扰情况下的容量约束治疗分配问题。我们考虑一个动态模型,其中决策者在每个时间步分配治疗,而异质主体则短视地对先前的治疗分配策略做出最优响应。当主体数量庞大但有限时,我们证明给定策略下接受治疗的门槛值收敛于该策略的均值场均衡阈值。基于这一结果,我们提出了一种策略梯度的一致估计量。通过使用1988年美国国家教育纵向研究数据的半合成实验,我们证明了该估计量可用于在存在战略行为的情况下学习容量约束策略。