A key strategy in societal adaptation to climate change is using alert systems to prompt preventative action and reduce the adverse health impacts of extreme heat events. This paper implements and evaluates reinforcement learning (RL) as a tool to optimize the effectiveness of such systems. Our contributions are threefold. First, we introduce a new publicly available RL environment enabling the evaluation of the effectiveness of heat alert policies to reduce heat-related hospitalizations. The rewards model is trained from a comprehensive dataset of historical weather, Medicare health records, and socioeconomic/geographic features. We use scalable Bayesian techniques tailored to the low-signal effects and spatial heterogeneity present in the data. The transition model uses real historical weather patterns enriched by a data augmentation mechanism based on climate region similarity. Second, we use this environment to evaluate standard RL algorithms in the context of heat alert issuance. Our analysis shows that policy constraints are needed to improve RL's initially poor performance. Third, a post-hoc contrastive analysis provides insight into scenarios where our modified heat alert-RL policies yield significant gains/losses over the current National Weather Service alert policy in the United States.
翻译:社会适应气候变化的一项关键策略是利用警报系统来促使预防性行动,从而减少极端高温事件对健康的不利影响。本文实现并评估了强化学习作为优化此类系统有效性的工具。我们的贡献主要有三方面。首先,我们引入了一个新的公开可用的强化学习环境,用于评估高温警报政策在减少高温相关住院方面的有效性。奖励模型基于历史天气、医疗保险健康记录以及社会经济/地理特征的综合数据集进行训练。我们采用了可扩展的贝叶斯技术,专门针对数据中存在的低信号效应和空间异质性进行了优化。转移模型使用真实历史天气模式,并通过基于气候区域相似性的数据增强机制进行了丰富。其次,我们利用该环境评估了标准强化学习算法在高温警报发布场景下的表现。我们的分析表明,需要引入策略约束以改善强化学习最初较差的性能。第三,事后对比分析深入揭示了在我们改进的高温警报-强化学习策略相较于美国现行国家气象局警报政策取得显著收益/损失的具体场景。