Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
翻译:为强化学习生成解释具有挑战性,因为动作可能对未来产生长期影响。本文提出一种新颖的可解释强化学习框架,通过无需环境因果结构先验知识的方式学习因果世界模型。该模型捕捉动作的影响,使我们能够通过因果链解释动作的长期效应——这些因果链展示了动作如何影响环境变量并最终导致奖励。与多数因准确率低而受限的解释模型不同,我们的模型在提升可解释性的同时保持高准确率,使其适用于基于模型的学习。结果表明,我们的因果模型能够充当可解释性与学习之间的桥梁。