A key strategy in societal adaptation to climate change is using alert systems to prompt preventative action and reduce the adverse health impacts of extreme heat events. This paper implements and evaluates reinforcement learning (RL) as a tool to optimize the effectiveness of such systems. Our contributions are threefold. First, we introduce a new publicly available RL environment enabling the evaluation of the effectiveness of heat alert policies to reduce heat-related hospitalizations. The rewards model is trained from a comprehensive dataset of historical weather, Medicare health records, and socioeconomic/geographic features. We use scalable Bayesian techniques tailored to the low-signal effects and spatial heterogeneity present in the data. The transition model uses real historical weather patterns enriched by a data augmentation mechanism based on climate region similarity. Second, we use this environment to evaluate standard RL algorithms in the context of heat alert issuance. Our analysis shows that policy constraints are needed to improve RL's initially poor performance. Third, a post-hoc contrastive analysis provides insight into scenarios where our modified heat alert-RL policies yield significant gains/losses over the current National Weather Service alert policy in the United States.
翻译:社会适应气候变化的一项关键策略是利用预警系统来促进预防性行动,从而减少极端高温事件对健康的不利影响。本文实现并评估了强化学习作为一种优化此类系统有效性的工具。我们的贡献有三方面。首先,我们引入了一个新的公开可用的强化学习环境,用于评估高温预警政策在减少高温相关住院率方面的有效性。奖励模型基于历史天气数据、医疗保险健康记录以及社会经济/地理特征的综合数据集训练而成。我们采用了可扩展的贝叶斯技术,专门针对数据中存在的低信号效应和空间异质性进行设计。状态转移模型使用真实历史天气模式,并通过基于气候区域相似性的数据增强机制进行丰富。其次,我们利用该环境评估了标准强化学习算法在高温预警发布场景下的表现。分析表明,需要引入策略约束以改进强化学习初始较差的性能。第三,事后对比分析揭示了在哪些场景下,我们改进后的高温预警-强化学习策略相较于美国现行的国家气象局预警政策能产生显著的收益/损失。