Despite the impressive feats demonstrated by Reinforcement Learning (RL), these algorithms have seen little adoption in high-risk, real-world applications due to current difficulties in explaining RL agent actions and building user trust. We present Counterfactual Demonstrations for Explanation (CODEX), a method that incorporates semantic clustering, which can effectively summarize RL agent behavior in the state-action space. Experimentation on the MiniGrid and StarCraft II gaming environments reveals the semantic clusters retain temporal as well as entity information, which is reflected in the constructed summary of agent behavior. Furthermore, clustering the discrete+continuous game-state latent representations identifies the most crucial episodic events, demonstrating a relationship between the latent and semantic spaces. This work contributes to the growing body of work that strives to unlock the power of RL for widespread use by leveraging and extending techniques from Natural Language Processing.
翻译:尽管强化学习展现了令人瞩目的能力,但由于当前难以解释智能体行为并建立用户信任,这类算法在高风险现实应用中的部署仍十分有限。本文提出基于反事实推理解释的演示生成方法(CODEX),该方法融合语义聚类技术,能有效总结强化学习智能体在状态-动作空间中的行为模式。在MiniGrid和星际争霸II游戏环境中的实验表明,语义聚类保留了时间与实体信息,这一特性直接体现在所构建的智能体行为摘要中。此外,对离散+连续游戏状态潜在表征的聚类分析可识别最关键的情节事件,揭示了潜在空间与语义空间之间的关联。本工作通过借鉴并扩展自然语言处理技术,为释放强化学习在广泛场景中的应用潜力贡献了新的研究视角。