Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning

Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control.

翻译：目的：人工胰腺在实现1型糖尿病患者的闭环血糖控制方面显示出巨大潜力。然而，由于复杂的生理过程、延迟的胰岛素响应以及不准确的血糖测量，设计有效的人工胰腺控制策略仍然具有挑战性。尽管模型预测控制通过动态模型和安全约束提供了安全性与稳定性，但缺乏个性化能力，且易受未预先通知的进食影响。相反，深度强化学习虽能提供个性化自适应策略，却面临分布偏移和数据需求庞大的挑战。方法：我们提出了一种面向人工胰腺的混合控制策略，旨在应对上述挑战。该策略融合了模型预测控制策略与集成深度强化学习策略，充分利用两者的优势并弥补各自局限。为加速人工胰腺系统在实际场景中的部署，我们进一步将元学习技术纳入该混合控制策略，利用过往经验与患者共享知识，在有限可用数据下实现对新患者的快速适应。结果：我们采用经FDA批准的UVA/Padova 1型糖尿病模拟器，在三种场景下开展了广泛实验。所提方法在目标正常血糖范围内的时间占比达到最高，且低血糖事件发生率最低。结论：实验结果明确证明了我们方法在1型糖尿病患者闭环血糖管理中的优越性。意义：本研究为人工胰腺系统提出了新颖的控制策略，证实了所提方法在实现高效闭环血糖控制方面的巨大潜力。