The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.
翻译:摘要:大麻使用及大麻使用障碍(CUD)的日益流行,已成为全球面临的重大公共卫生挑战。尤其在年轻成人(EA;18-25岁)群体中,治疗缺口显著。解决大麻使用及CUD问题是《2030年联合国可持续发展议程》(SDG)中的关键目标。本研究开发了一种名为reBandit的在线强化学习(RL)算法,该算法将应用于移动健康研究,为年轻成人提供个性化的移动健康干预措施,以助力减少大麻使用。reBandit利用随机效应和贝叶斯先验信息,在存在噪声的移动健康环境中实现快速高效学习。此外,reBandit采用经验贝叶斯与优化技术,在线自主更新其超参数。为评估算法性能,我们基于既往研究数据构建仿真测试平台,并与移动健康研究中常用的基线算法进行对比。结果表明,reBandit的性能不劣于甚至优于所有基线算法。随着仿真环境中人群异质性增加,算法性能差距进一步扩大,验证了其适应多样化研究人群的卓越能力。