Pain management in intensive care usually involves complex trade-offs between therapeutic goals and patient safety, since both inadequate and excessive treatment may induce serious sequelae. Reinforcement learning can help address this challenge by learning medication dosing policies from retrospective data. However, prior work on sedation and analgesia has optimized for objectives that do not value patient survival while relying on algorithms unsuitable for imperfect information settings. We investigated the risks of these design choices by implementing a deep reinforcement learning framework to suggest hourly medication doses under partial observability. Using data from 47,144 ICU stays in the MIMIC-IV database, we trained policies to prescribe opioids, propofol, benzodiazepines, and dexmedetomidine according to two goals: reduce pain or jointly reduce pain and mortality. We found that, although the two policies were associated with lower pain, actions from the first policy were positively correlated with mortality, while those proposed by the second policy were negatively correlated. This suggests that valuing long-term outcomes could be critical for safer treatment policies, even if a short-term goal remains the primary objective.
翻译:重症监护中的疼痛管理通常需要在治疗目标与患者安全之间进行复杂的权衡,因为治疗不足或过度均可能引发严重后遗症。强化学习可通过从回顾性数据中学习药物剂量策略来应对这一挑战。然而,既往关于镇静与镇痛的研究所优化的目标未考虑患者生存价值,且依赖的算法不适用于不完全信息场景。我们通过构建深度强化学习框架,在部分可观测条件下提出每小时药物剂量建议,以评估这些设计选择的风险。基于MIMIC-IV数据库中47,144例ICU住院数据,我们训练了两种策略来开具阿片类药物、丙泊酚、苯二氮䓬类药物及右美托咪定的处方:其一是单纯减轻疼痛,其二是联合减轻疼痛与降低死亡率。研究发现,尽管两种策略均与疼痛减轻相关,但第一种策略的给药行为与死亡率呈正相关,而第二种策略的给药行为与死亡率呈负相关。这表明,即使短期目标仍是主要治疗方向,重视长期结局对于制定更安全的治疗策略至关重要。