Contextual multi-armed bandit (cMAB) algorithms offer a promising framework for adapting behavioral interventions to individuals over time. However, cMABs often require large samples to learn effectively and typically rely on a finite pre-set of fixed message templates. In this paper, we present a hybrid cMABxLLM approach in which the cMAB selects an intervention type, and a large language model (LLM) which personalizes the message content within the selected type. We deployed this approach in a 30-day physical-activity intervention, comparing four behavioral change intervention types: behavioral self-monitoring, gain-framing, loss-framing, and social comparison, delivered as daily motivational messages to support motivation and achieve a daily step count. Message content is personalized using dynamic contextual factors, including daily fluctuations in self-efficacy, social influence, and regulatory focus. Over the trial, participants received daily messages assigned by one of five models: equal randomization (RCT), cMAB only, LLM only, LLM with interaction history, or cMABxLLM. Outcomes include motivation towards physical activity and message usefulness, assessed via ecological momentary assessments (EMAs). We evaluate and compare the five delivery models using pre-specified statistical analyses that account for repeated measures and time trends. We find that the cMABxLLM approach retains the perceived acceptance of LLM-generated messages, while reducing token usage and providing an explicit, reproducible decision rule for intervention selection. This hybrid approach also avoids the skew in intervention delivery by improving support for under-delivered intervention types. More broadly, our approach provides a deployable template for combining Bayesian adaptive experimentation with generative models in a way that supports both personalization and interpretability.
翻译:情境多臂老虎机算法为随时间推移为个体适配行为干预措施提供了一个有前景的框架。然而,情境多臂老虎机通常需要大量样本才能有效学习,并且通常依赖于一组有限的预设固定消息模板。本文提出了一种混合情境多臂老虎机x大语言模型方法,其中情境多臂老虎机负责选择干预类型,而大语言模型则在选定的类型内对消息内容进行个性化定制。我们将此方法部署于一项为期30天的体育活动干预研究中,比较了四种行为改变干预类型:行为自我监控、收益框架、损失框架和社会比较,这些干预以每日激励性消息的形式发送,旨在支持动机并达成每日步数目标。消息内容利用动态情境因素进行个性化,包括自我效能感、社会影响和调节焦点的每日波动。在整个试验期间,参与者每天接收由五种模型之一分配的消息:等概率随机化、仅情境多臂老虎机、仅大语言模型、带交互历史的大语言模型或情境多臂老虎机x大语言模型。结果指标包括对体育活动的动机和消息有用性,通过生态瞬时评估进行测量。我们使用预先设定的、考虑了重复测量和时间趋势的统计分析来评估和比较这五种消息传递模型。研究发现,情境多臂老虎机x大语言模型方法在保持大语言模型生成消息的感知接受度的同时,减少了令牌使用量,并为干预选择提供了明确、可复现的决策规则。这种混合方法还通过改善对低频率干预类型的支持,避免了干预传递的偏斜。更广泛地说,我们的方法提供了一个可部署的模板,用于将贝叶斯自适应实验与生成模型相结合,从而同时支持个性化和可解释性。