Blood Glucose (BG) control involves keeping an individual's BG within a healthy range through extracorporeal insulin injections is an important task for people with type 1 diabetes. However,traditional patient self-management is cumbersome and risky. Recent research has been devoted to exploring individualized and automated BG control approaches, among which Deep Reinforcement Learning (DRL) shows potential as an emerging approach. In this paper, we use an exponential decay model of drug concentration to convert the formalization of the BG control problem, which takes into account the delay and prolongedness of drug effects, from a PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process) to a MDP, and we propose a novel multi-step DRL-based algorithm to solve the problem. The Prioritized Experience Replay (PER) sampling method is also used in it. Compared to single-step bootstrapped updates, multi-step learning is more efficient and reduces the influence from biasing targets. Our proposed method converges faster and achieves higher cumulative rewards compared to the benchmark in the same training environment, and improves the time-in-range (TIR), the percentage of time the patient's BG is within the target range, in the evaluation phase. Our work validates the effectiveness of multi-step reinforcement learning in BG control, which may help to explore the optimal glycemic control measure and improve the survival of diabetic patients.
翻译:血糖控制是指通过体外注射胰岛素将个体血糖维持在健康范围内,这对于1型糖尿病患者而言是一项重要任务。然而,传统的患者自我管理方法繁琐且存在风险。近年来的研究致力于探索个体化与自动化的血糖控制方法,其中深度强化学习作为一种新兴技术展现出潜力。本文利用药物浓度的指数衰减模型,将考虑药物作用延迟与持续性的血糖控制问题形式化,从具有延迟效应-部分可观测马尔可夫决策过程转换为马尔可夫决策过程,并提出了一种新颖的多步深度强化学习算法来解决该问题。该算法还采用了优先经验回放采样方法。与单步自助更新相比,多步学习方法效率更高,并能减少偏差目标的影响。在相同训练环境下,我们所提方法相较于基准方法收敛更快、累计奖励更高,并在评估阶段提升了目标范围内时间(即患者血糖处于目标范围的持续时间百分比)。本研究验证了多步强化学习在血糖控制中的有效性,有望助力探索最优血糖控制策略,改善糖尿病患者生存质量。