There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.
翻译:人们对使用强化学习(RL)在数字健康中个性化治疗序列以支持用户养成更健康行为的兴趣日益浓厚。此类序列决策问题涉及根据用户情境(如先前的活动水平、位置等)决定何时以及如何进行治疗。在线RL是一种有前景的数据驱动方法,它根据每个用户的历史响应进行学习,并利用这些知识个性化这些决策。然而,为了决定是否应将RL算法纳入面向真实世界部署的"优化"干预措施,我们必须评估数据证据,以表明RL算法确实在为其用户个性化治疗方案。由于RL算法本身的随机性,人们可能会误以为它在某些状态下正在学习并利用这种学习提供特定治疗。我们定义一个实用性的个性化概念,并引入一种基于重采样的方法,用于探究RL算法所展现的个性化是否仅仅是其随机性的人为产物。我们通过一项名为HeartSteps的体力活动临床试验数据案例研究来展示我们的方法,该试验使用了在线RL算法。我们证明了该方法如何增强基于数据的算法个性化"广告真实性",既涵盖研究中所有用户,也针对特定用户个体。