Promoting healthy lifestyle behaviors remains a major public health concern, particularly due to their crucial role in preventing chronic conditions such as cancer, heart disease, and type 2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable health behavior change promotion. Researchers are increasingly exploring adaptive algorithms that personalize interventions to each person's unique context. However, in empirical studies, mobile health applications often suffer from small effect sizes and low adherence rates, particularly in comparison to human coaching. Tailoring advice to a person's unique goals, preferences, and life circumstances is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions. To address this, we introduce a new Thompson sampling algorithm that can accommodate personalized reward functions (i.e., goals, preferences, and constraints), while also leveraging data sharing across individuals to more quickly be able to provide effective recommendations. We prove that our modification incurs only a constant penalty on cumulative regret while preserving the sample complexity benefits of data sharing. We present empirical results on synthetic and semi-synthetic physical activity simulators, where in the latter we conducted an online survey to solicit preference data relating to physical activity, which we use to construct realistic reward models that leverages historical data from another study. Our algorithm achieves substantial performance improvements compared to baselines that do not share data or do not optimize for individualized rewards.
翻译:促进健康生活方式行为仍是公共卫生领域的重要关切,这尤其源于其在预防癌症、心脏病和2型糖尿病等慢性病中的关键作用。移动健康应用为低成本、可扩展的健康行为改变促进提供了一条前景广阔的途径。研究者正日益探索能够根据个人独特情境进行干预个性化的自适应算法。然而,在实证研究中,移动健康应用往往效果量较小且依从率较低,特别是在与人工健康指导相比时。根据个人独特的目标、偏好和生活环境来定制建议,是健康指导的关键组成部分,但在移动健康干预的自适应算法中尚未得到充分利用。为解决这一问题,我们提出了一种新的Thompson采样算法,该算法能够适应个性化的奖励函数(即目标、偏好和约束),同时利用跨个体数据共享以更快地提供有效建议。我们证明,该改进仅对累积遗憾产生常数惩罚,同时保留了数据共享带来的样本复杂度优势。我们在合成与半合成体力活动模拟器上展示了实证结果,其中在半合成模拟中,我们通过在线调查收集了与体力活动相关的偏好数据,并利用这些数据构建了结合另一研究历史数据的现实奖励模型。相较于不共享数据或不针对个体化奖励进行优化的基线方法,我们的算法实现了显著的性能提升。