Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents' decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, {\method}. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed {\method} in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy.
翻译:模仿学习通过模仿专家演示来学习智能体策略,已在医疗治疗方案和自动驾驶等众多应用中展现出良好效果。然而,解释智能体学习的控制策略仍是一项困难任务。困难主要来自两个方面:1) 模仿学习中的智能体通常实现为深度神经网络,这些黑箱模型缺乏可解释性;2) 智能体决策背后的潜在因果机制可能沿轨迹变化,而非在时间步长中保持静态。为提高神经智能体的透明度和可解释性,我们提出将其捕获的知识以有向无环因果图的形式呈现,其中节点表示动作和状态变量,边表示预测背后的因果关系。此外,我们设计这一因果发现过程为状态依赖的,使其能够建模潜在因果图的动态性。具体而言,我们从格兰杰因果的角度进行因果发现,并提出一个自解释的模仿学习框架{\method}。该框架由三个部分组成:动态因果发现模块、因果编码模块和预测模块,并以端到端的方式进行训练。模型学习完成后,我们可以获得其决策背后状态与动作变量之间的因果关系,从而揭示其学到的策略。在合成数据集和真实数据集上的实验结果表明,所提出的{\method}在学习动态因果图以理解模仿学习决策过程的同时,能够保持高预测准确性。