The widespread adoption of effective hybrid closed loop systems would represent an important milestone of care for people living with type 1 diabetes (T1D). These devices typically utilise simple control algorithms to select the optimal insulin dose for maintaining blood glucose levels within a healthy range. Online reinforcement learning (RL) has been utilised as a method for further enhancing glucose control in these devices. Previous approaches have been shown to reduce patient risk and improve time spent in the target range when compared to classical control algorithms, but are prone to instability in the learning process, often resulting in the selection of unsafe actions. This work presents an evaluation of offline RL for developing effective dosing policies without the need for potentially dangerous patient interaction during training. This paper examines the utility of BCQ, CQL and TD3-BC in managing the blood glucose of the 30 virtual patients available within the FDA-approved UVA/Padova glucose dynamics simulator. When trained on less than a tenth of the total training samples required by online RL to achieve stable performance, this work shows that offline RL can significantly increase time in the healthy blood glucose range from 61.6 +\- 0.3% to 65.3 +/- 0.5% when compared to the strongest state-of-art baseline (p < 0.001). This is achieved without any associated increase in low blood glucose events. Offline RL is also shown to be able to correct for common and challenging control scenarios such as incorrect bolus dosing, irregular meal timings and compression errors.
翻译:广泛采用有效的混合闭环系统将是1型糖尿病(T1D)患者护理领域的重要里程碑。这类设备通常采用简单控制算法来选择最佳胰岛素剂量,以将血糖水平维持在健康范围内。在线强化学习(RL)已被用作进一步提升这些设备血糖控制效果的方法。已有研究表明,与传统控制算法相比,该方法能降低患者风险并延长目标范围内时间,但学习过程易出现不稳定,常导致选择不安全的行为。本研究评估了离线RL在无需训练期间进行潜在危险患者交互的情况下制定有效给药策略的效果。本文考察了BCQ、CQL和TD3-BC三种算法在FDA批准的UVA/Padova血糖动态模拟器中管理30名虚拟患者血糖水平的效用。与在线RL达到稳定性能所需的总训练样本相比,本研究仅使用不到其十分之一的样本进行训练,结果显示离线RL能将健康血糖范围内的时间从61.6 ± 0.3%显著提升至65.3 ± 0.5%(p < 0.001),且未伴随低血糖事件增加。此外,离线RL还能有效纠正常见的挑战性控制场景,如错误注射剂量、不规律进餐时间和传感器压缩误差。