While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.
翻译:尽管强化学习在许多领域的序贯决策问题中取得了巨大成功,但其仍面临数据效率低下和可解释性不足等关键挑战。有趣的是,近年来众多研究者从因果科学文献中汲取洞见,催生了大量融合因果特性并有效应对强化学习挑战的成果。因此,系统梳理这些因果强化学习相关研究、评述因果强化学习方法、探究因果对强化学习的潜在赋能作用,具有重要的必要性和意义。具体而言,我们根据因果信息是否预先给定,将现有因果强化学习方法分为两类,并进一步从马尔可夫决策过程、部分可观测马尔可夫决策过程、多臂赌博机及动态治疗方案等不同模型形式化角度对每类方法进行剖析。此外,我们总结了评估指标与开源资源,探讨了新兴应用场景,并展望了因果强化学习的未来发展前景。