There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.
翻译:在数字健康领域,利用强化学习(RL)个性化治疗序列以支持用户养成更健康行为的研究日益受到关注。这类序贯决策问题涉及根据用户情境(如既往活动水平、地理位置等)决定治疗时机与方式。在线RL作为一种有前景的数据驱动方法,能够基于每个用户的历史响应进行学习,并利用相关知识实现个性化决策。然而,要判断该RL算法是否应被纳入现实部署中的“优化”干预方案,我们必须评估数据证据是否表明该RL算法确实在为其实施个性化治疗。由于RL算法存在随机性,研究者可能误以为它能在特定状态下进行学习并据此提供针对性治疗。本文采纳了个性化的操作性定义,并引入一种基于重采样的方法论,用于探究RL算法所展现的个性化现象是否仅为算法随机性的人为产物。我们通过一项名为HeartSteps的身体活动临床试验数据(其中使用了在线RL算法)进行案例研究,验证了该方法如何增强算法个性化在数据驱动的真实性声明——既涵盖所有用户,亦涉及研究中的特定用户个体。