Blood Glucose (BG) control involves keeping an individual's BG within a healthy range through extracorporeal insulin injections is an important task for people with type 1 diabetes. However,traditional patient self-management is cumbersome and risky. Recent research has been devoted to exploring individualized and automated BG control approaches, among which Deep Reinforcement Learning (DRL) shows potential as an emerging approach. In this paper, we use an exponential decay model of drug concentration to convert the formalization of the BG control problem, which takes into account the delay and prolongedness of drug effects, from a PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process) to a MDP, and we propose a novel multi-step DRL-based algorithm to solve the problem. The Prioritized Experience Replay (PER) sampling method is also used in it. Compared to single-step bootstrapped updates, multi-step learning is more efficient and reduces the influence from biasing targets. Our proposed method converges faster and achieves higher cumulative rewards compared to the benchmark in the same training environment, and improves the time-in-range (TIR), the percentage of time the patient's BG is within the target range, in the evaluation phase. Our work validates the effectiveness of multi-step reinforcement learning in BG control, which may help to explore the optimal glycemic control measure and improve the survival of diabetic patients.
翻译:血糖控制通过体外胰岛素注射将个体血糖维持在健康范围内,是1型糖尿病患者的重要任务。然而,传统的患者自我管理方法繁琐且存在风险。近年来的研究致力于探索个性化和自动化的血糖控制方法,其中深度强化学习作为一种新兴技术展现出潜力。本文利用药物浓度的指数衰减模型,将考虑药物作用延迟与持续时间的血糖控制问题形式化,从具有延长动作效应的部分可观测马尔可夫决策过程转换为标准马尔可夫决策过程,并提出一种新颖的多步深度强化学习算法加以解决。该算法同时采用了优先经验回放采样方法。相较于单步自举更新,多步学习效率更高且能减少偏差目标的影响。在相同训练环境下,我们提出的方法相比基准算法收敛更快、累积奖励更高,并在评估阶段提升了血糖达标时间百分比——即患者血糖处于目标范围内的时间比例。本研究验证了多步强化学习在血糖控制中的有效性,有助于探索最优血糖调控方案并改善糖尿病患者生存状况。