While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.
翻译:尽管强化学习在诸多领域的序列决策问题中取得了巨大成功,但仍面临数据效率低下和缺乏可解释性的关键挑战。有趣的是,近年来许多研究者借鉴因果推断文献的见解,催生了大量融合因果机制与强化学习优势、从而有效应对其挑战的杰出工作。因此,系统梳理这些因果强化学习(CRL)成果、综述CRL方法并探究因果机制对强化学习的潜在功能具有重要必要性与意义。具体而言,我们根据因果信息是否预先给定,将现有CRL方法分为两类。针对每类方法,我们进一步从马尔可夫决策过程(MDP)、部分可观测马尔可夫决策过程(POMDP)、多臂老虎机(MAB)及动态治疗方案(DTR)等不同模型的规范化角度展开分析。此外,我们总结了评估指标与开源资源,讨论了新兴应用场景,并展望了CRL未来发展的广阔前景。