Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards

Invasive mechanical ventilation (MV) is a life-sustaining therapy commonly used in the intensive care unit (ICU) for patients with severe and acute conditions. These patients frequently rely on MV for breathing. Given the high risk of death in such cases, optimal MV settings can reduce mortality, minimize ventilator-induced lung injury, shorten ICU stays, and ease the strain on healthcare resources. However, optimizing MV settings remains a complex and error-prone process due to patient-specific variability. While Offline Reinforcement Learning (RL) shows promise for optimizing MV settings, current methods struggle with the hybrid (continuous and discrete) nature of MV settings. Discretizing continuous settings leads to exponential growth in the action space, which limits the number of optimizable settings. Converting the predictions back to continuous can cause a distribution shift, compromising safety and performance. To address this challenge, in the IntelliLung project, we are developing an AI-based approach where we constrain the action space and employ factored action critics. This approach allows us to scale to six optimizable settings compared to 2-3 in previous studies. We adapt SOTA offline RL algorithms to operate directly on hybrid action spaces, avoiding the pitfalls of discretization. We also introduce a clinically grounded reward function based on ventilator-free days and physiological targets. Using multiobjective optimization for reward selection, we show that this leads to a more equitable consideration of all clinically relevant objectives. Notably, we develop a system in close collaboration with healthcare professionals that is aligned with real-world clinical objectives and designed with future deployment in mind.

翻译：有创机械通气（MV）是重症监护病房（ICU）中针对病情危重急症患者常用的生命支持疗法。这类患者常依赖MV进行呼吸。鉴于此类病例的高死亡风险，优化MV参数设置可降低死亡率、减少呼吸机相关性肺损伤、缩短ICU住院时间并缓解医疗资源压力。然而，由于患者个体差异性，优化MV参数设置仍是一个复杂且易出错的过程。尽管离线强化学习（RL）在优化MV参数方面展现出潜力，但现有方法难以处理MV参数混合（连续与离散）的特性。对连续参数进行离散化会导致动作空间呈指数级增长，从而限制可优化参数的数量。将预测结果转换回连续值则可能引发分布偏移，损害安全性与性能。为应对这一挑战，在IntelliLung项目中，我们正在开发一种基于人工智能的方法：通过约束动作空间并采用分解动作评价器，使可优化参数扩展至六个，而以往研究仅能优化2-3个参数。我们改进前沿离线RL算法，使其能直接在混合动作空间上运行，从而规避离散化的缺陷。同时，我们引入基于脱机天数和生理学目标的临床化奖励函数，通过多目标优化进行奖励选择，证明该方法能更公平地权衡所有临床相关目标。值得注意的是，我们与医疗专业人员紧密合作开发的系统，既符合真实临床目标，亦为未来实际部署而设计。