Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
翻译:为强化学习生成解释颇具挑战性,因为动作可能对未来产生长期影响。本文提出一种新颖的可解释强化学习框架,通过在学习过程中无需环境因果结构先验知识即可构建因果世界模型。该模型捕捉动作的因果效应,使我们能够通过因果链解释动作的长期影响——揭示动作如何影响环境变量并最终导向奖励。与多数解释性模型存在精度不足的问题不同,本模型在提升可解释性的同时保持高精度,使其适用于基于模型的学习场景。最终,我们证明该因果模型可充当可解释性与学习性能之间的桥梁。