Contextual multi-armed bandit (cMAB) algorithms offer a promising framework for adapting behavioral interventions to individuals over time. However, cMABs often require large samples to learn effectively and typically rely on a finite pre-set of fixed message templates. In this paper, we present a hybrid cMABxLLM approach in which the cMAB selects an intervention type, and a large language model (LLM) which personalizes the message content within the selected type. We deployed this approach in a 30-day physical-activity intervention, comparing four behavioral change intervention types: behavioral self-monitoring, gain-framing, loss-framing, and social comparison, delivered as daily motivational messages to support motivation and achieve a daily step count. Message content is personalized using dynamic contextual factors, including daily fluctuations in self-efficacy, social influence, and regulatory focus. Over the trial, participants received daily messages assigned by one of five models: equal randomization (RCT), cMAB only, LLM only, LLM with interaction history, or cMABxLLM. Outcomes include motivation towards physical activity and message usefulness, assessed via ecological momentary assessments (EMAs). We evaluate and compare the five delivery models using pre-specified statistical analyses that account for repeated measures and time trends. We find that the cMABxLLM approach retains the perceived acceptance of LLM-generated messages, while reducing token usage and providing an explicit, reproducible decision rule for intervention selection. This hybrid approach also avoids the skew in intervention delivery by improving support for under-delivered intervention types. More broadly, our approach provides a deployable template for combining Bayesian adaptive experimentation with generative models in a way that supports both personalization and interpretability.
翻译:情境多臂老虎机(cMAB)算法为随时间推移适应个体行为干预提供了有前景的框架。然而,cMAB通常需要大量样本才能有效学习,且通常依赖于一组有限的预设固定消息模板。本文提出一种混合cMABxLLM方法:cMAB负责选择干预类型,大型语言模型(LLM)则在选定类型内个性化生成消息内容。我们将该方法部署于一项为期30天的体育锻炼干预研究,比较了四种行为改变干预类型:行为自我监测、收益框架、损失框架和社会比较,通过每日激励消息的形式传递,以支持动机并达成每日步数目标。消息内容基于动态情境因素进行个性化定制,包括自我效能感、社会影响和调节定向的日常波动。试验期间,参与者每日接收由以下五种模型之一分配的消息:等概率随机分配(RCT)、仅cMAB、仅LLM、含交互历史的LLM,或cMABxLLM混合模型。结果指标包括体育锻炼动机和消息有用性,通过生态瞬时评估(EMA)进行测量。我们采用考虑重复测量和时间趋势的预设统计分析方法,对五种消息传递模型进行评估比较。研究发现,cMABxLLM方法在保持LLM生成消息感知接受度的同时,降低了令牌使用量,并为干预选择提供了明确、可复现的决策规则。该混合方法还通过改善对低频率干预类型的支持,避免了干预传递的偏斜分布。更广泛而言,本研究提出的方法为贝叶斯自适应实验与生成模型的结合提供了可部署的模板,在支持个性化定制的同时保持了可解释性。