Efforts to reduce maternal mortality rate, a key UN Sustainable Development target (SDG Target 3.1), rely largely on preventative care programs to spread critical health information to high-risk populations. These programs face two important challenges: efficiently allocating limited health resources to large beneficiary populations, and adapting to evolving policy priorities. While prior works in restless multi-armed bandit (RMAB) demonstrated success in public health allocation tasks, they lack flexibility to adapt to evolving policy priorities. Concurrently, Large Language Models (LLMs) have emerged as adept, automated planners in various domains, including robotic control and navigation. In this paper, we propose DLM: a Decision Language Model for RMABs. To enable dynamic fine-tuning of RMAB policies for challenging public health settings using human-language commands, we propose using LLMs as automated planners to (1) interpret human policy preference prompts, (2) propose code reward functions for a multi-agent RL environment for RMABs, and (3) iterate on the generated reward using feedback from RMAB simulations to effectively adapt policy outcomes. In collaboration with ARMMAN, an India-based public health organization promoting preventative care for pregnant mothers, we conduct a simulation study, showing DLM can dynamically shape policy outcomes using only human language commands as input.
翻译:降低孕产妇死亡率是联合国可持续发展目标(SDG 3.1)的关键任务,这项工作主要依赖预防性护理项目向高风险人群传播关键健康信息。这些项目面临两大挑战:将有限的卫生资源高效分配给大量受益人群,以及适应不断变化的政策重点。尽管先前基于不安定多臂老虎机(RMAB)的研究在公共卫生资源分配任务中取得了成功,但缺乏应对政策重点动态变化的灵活性。与此同时,大语言模型(LLMs)已成为机器人控制、导航等领域的得力自动化规划工具。本文提出DLM:一种面向RMAB的决策语言模型。为实现在复杂公共卫生场景中通过人类语言指令动态微调RMAB策略,我们提出利用LLM作为自动化规划器:(1)解读人类政策偏好指令,(2)为RMAB的多智能体强化学习环境生成奖励函数代码,(3)通过RMAB仿真反馈迭代优化生成的奖励,以有效调整策略效果。与印度公共卫生组织ARMMAN(致力于推广孕妇预防性护理)合作开展的仿真研究表明,DLM仅需人类语言指令作为输入即可动态塑造策略效果。