Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
翻译:为强化学习(RL)生成解释具有挑战性,因为动作可能对未来产生长期影响。本文提出了一种新颖的可解释强化学习框架,该框架无需环境因果结构的先验知识即可学习因果世界模型。该模型捕捉了动作的影响,使我们能够通过因果链解释动作的长期效应,从而呈现动作如何影响环境变量并最终获得奖励。与大多数因低准确性而受限的解释性模型不同,我们的模型在提升可解释性的同时保持高精度,使其适用于基于模型的学习。因此,我们证明了因果模型可作为可解释性与学习之间的桥梁。