Efforts to reduce maternal mortality rate, a key UN Sustainable Development target (SDG Target 3.1), rely largely on preventative care programs to spread critical health information to high-risk populations. These programs face two important challenges: efficiently allocating limited health resources to large beneficiary populations, and adapting to evolving policy priorities. While prior works in restless multi-armed bandit (RMAB) demonstrated success in public health allocation tasks, they lack flexibility to adapt to evolving policy priorities. Concurrently, Large Language Models (LLMs) have emerged as adept, automated planners in various domains, including robotic control and navigation. In this paper, we propose DLM: a Decision Language Model for RMABs. To enable dynamic fine-tuning of RMAB policies for challenging public health settings using human-language commands, we propose using LLMs as automated planners to (1) interpret human policy preference prompts, (2) propose code reward functions for a multi-agent RL environment for RMABs, and (3) iterate on the generated reward using feedback from RMAB simulations to effectively adapt policy outcomes. In collaboration with ARMMAN, an India-based public health organization promoting preventative care for pregnant mothers, we conduct a simulation study, showing DLM can dynamically shape policy outcomes using only human language commands as input.
翻译:[translated abstract in Chinese]
降低孕产妇死亡率作为联合国可持续发展目标(SDG目标3.1)的关键指标,主要依赖于预防性保健计划向高危人群传播关键健康信息。此类计划面临两大挑战:高效分配有限卫生资源至庞大受益人群,以及适应不断演变的政策优先级。尽管先前在非平稳多臂赌博机(RMAB)领域的研究已证明其在公共卫生资源分配任务中的有效性,但缺乏应对政策优先级变化的灵活性。与此同时,大语言模型(LLMs)在机器人控制、导航等多个领域展现出自主规划的卓越能力。本文提出DLM:一种用于RMAB的决策语言模型。为实现在复杂公共卫生场景中通过人类语言指令动态微调RMAB策略,我们提出利用LLM作为自主规划器:(1)解析人类政策偏好提示,(2)为RMAB多智能体强化学习环境生成奖励函数代码,(3)通过RMAB仿真反馈迭代优化生成的奖励,从而有效调整策略结果。与印度公共卫生组织ARMMAN(致力于孕产妇预防保健)合作开展的仿真研究表明,DLM仅需输入人类语言指令即可动态调整策略结果。