Evidence-based or data-driven dynamic treatment regimes are essential for personalized medicine, which can benefit from offline reinforcement learning (RL). Although massive healthcare data are available across medical institutions, they are prohibited from sharing due to privacy constraints. Besides, heterogeneity exists in different sites. As a result, federated offline RL algorithms are necessary and promising to deal with the problems. In this paper, we propose a multi-site Markov decision process model which allows both homogeneous and heterogeneous effects across sites. The proposed model makes the analysis of the site-level features possible. We design the first federated policy optimization algorithm for offline RL with sample complexity. The proposed algorithm is communication-efficient and privacy-preserving, which requires only a single round of communication interaction by exchanging summary statistics. We give a theoretical guarantee for the proposed algorithm without the assumption of sufficient action coverage, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed. Extensive simulations demonstrate the effectiveness of the proposed algorithm. The method is applied to a sepsis data set in multiple sites to illustrate its use in clinical settings.
翻译:基于证据或数据的动态治疗方案对于个性化医疗至关重要,而离线强化学习可为其提供支持。尽管各医疗机构拥有海量医疗数据,但由于隐私限制,这些数据无法共享。此外,不同机构之间存在异质性。因此,联邦离线强化学习算法对于解决这些问题既是必要的,也颇具前景。本文提出了一种多站点马尔可夫决策过程模型,该模型允许站点间存在同质效应与异质效应。所提出的模型使得对站点级特征的分析成为可能。我们设计了首个用于离线强化学习的联邦策略优化算法,并给出了样本复杂度分析。该算法仅需通过交换汇总统计量进行单轮通信交互,具有通信高效和隐私保护的特点。我们在不假设充分动作覆盖的前提下,为所提算法提供了理论保证,其学习策略的次优性可达到与数据未分布时相当的水平。大量仿真实验验证了所提算法的有效性。该方法被应用于多站点脓毒症数据集,以展示其在临床场景中的应用价值。