Policy Optimization for Personalized Interventions in Behavioral Health

Behavioral health interventions, delivered through digital platforms, have the potential to significantly improve health outcomes, through education, motivation, reminders, and outreach. We study the problem of optimizing personalized interventions for patients to maximize a long-term outcome, where interventions are costly and capacity-constrained. We assume there exists a dataset collected from an initial pilot study that we can leverage. We present a new approach for this problem that we dub DecompPI, which approximates one step of policy iteration. Implementing DecompPI simply consists of a prediction task using the dataset, alleviating the need for online experimentation. DecompPI is a generic model-free algorithm that can be used irrespective of the underlying patient behavior model. We derive theoretical guarantees on a simple, special case of the model that is representative of our problem setting. We establish an approximation ratio for DecompPI with respect to the improvement beyond a null policy that does not allocate interventions. Specifically, when the initial policy used to collect the data is randomized, the approximation ratio of the improvement approaches 1/2 as the intervention capacity of the initial policy decreases. We show that this guarantee is robust to estimation errors. We conduct a rigorous empirical case study using real-world data from a mobile health platform for improving treatment adherence for tuberculosis. Using a validated simulation model, we demonstrate that DecompPI can provide the same efficacy as the status quo approach with approximately half the capacity of interventions. DecompPI is simple and easy to implement for organizations aiming to improve long-term behavior through targeted interventions, and this paper demonstrates its strong performance both theoretically and empirically.

翻译：通过数字平台交付的行为健康干预措施，借助教育、激励、提醒和外联等手段，有望显著改善健康结局。本文研究在干预措施具有成本且容量受限的情况下，如何优化针对患者的个性化干预方案以最大化长期结局。我们假设存在一个从初步试点研究中收集的数据集可加以利用。我们提出了一种称为DecompPI的新方法，该方法近似实现策略迭代的一步。实施DecompPI仅需利用数据集完成一项预测任务，从而无需进行在线实验。DecompPI是一种通用的无模型算法，可适用于各种患者行为模型。我们在一个简单但具有代表性的模型特例上推导了理论保障。我们建立了DecompPI相对于不分配干预措施的空策略的改进近似比。具体而言，当用于收集数据的初始策略为随机化策略时，随着初始策略的干预容量降低，改进的近似比趋近于1/2。我们证明该保障对估计误差具有鲁棒性。我们利用来自移动健康平台的真实世界数据，开展了一项关于提高结核病治疗依从性的严格实证案例研究。通过经过验证的仿真模型，我们证明DecompPI能够在约一半的干预容量下达到与传统方法相同的疗效。对于旨在通过定向干预改善长期行为的组织机构而言，DecompPI简单易行，本文在理论和实证上均验证了其优异性能。