Public health programs often provide interventions to encourage beneficiary adherence,and effectively allocating interventions is vital for producing the greatest overall health outcomes. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model the complex RMAB settings present in public health program adherence problems, such as context and non-stationarity. BCoR's key strength is the ability to leverage shared information within and between arms to learn the unknown RMAB transition dynamics quickly in intervention-scarce settings with relatively short time horizons, which is common in public health applications. Empirically, BCoR achieves substantially higher finite-sample performance over a range of experimental settings, including an example based on real-world adherence data that was developed in collaboration with ARMMAN, an NGO in India which runs a large-scale maternal health program, showcasing BCoR practical utility and potential for real-world deployment.
翻译:公共卫生项目通常通过提供干预措施来促进受益者的依从性,而有效分配干预措施对于实现最佳整体健康结果至关重要。此类资源分配问题常被建模为具有未知底层转移动态的不安定多臂赌博机问题,因此需要在线强化学习。本文提出面向上下文不安定赌博机的贝叶斯学习方法,这是一种在线强化学习方法,创新性地将贝叶斯建模技术与汤普森采样相结合,以灵活建模公共卫生项目依从性问题中存在的复杂不安定赌博机场景,例如上下文信息和非平稳性。该方法的核心优势在于能够利用臂内和臂间的共享信息,在干预资源稀缺且时间跨度相对较短的情况下快速学习未知的不安定赌博机转移动态,这在公共卫生应用中十分常见。实证研究表明,在一系列实验场景中(包括基于真实世界依从性数据的案例),该方法在有限样本条件下均取得了显著更优的性能。该案例是与印度非政府组织ARMMAN合作开发的,该组织运营着一个大规模孕产妇健康项目,这展示了本方法的实际效用及其在现实世界部署的潜力。