We propose a model-free reinforcement learning architecture, called distributed attentional actor architecture after conditional attention (DA6-X), to provide better interpretability of conditional coordinated behaviors. The underlying principle involves reusing the saliency vector, which represents the conditional states of the environment, such as the global position of agents. Hence, agents with DA6-X flexibility built into their policy exhibit superior performance by considering the additional information in the conditional states during the decision-making process. The effectiveness of the proposed method was experimentally evaluated by comparing it with conventional methods in an objects collection game. By visualizing the attention weights from DA6-X, we confirmed that agents successfully learn situation-dependent coordinated behaviors by correctly identifying various conditional states, leading to improved interpretability of agents along with superior performance.
翻译:我们提出一种基于条件注意力机制的无模型强化学习架构——分布式条件注意力执行者架构(DA6-X),旨在提升条件协调行为的可解释性。其核心原理在于复用表征环境条件状态(如智能体全局位置)的显著性向量。因此,内嵌DA6-X灵活性的智能体在决策过程中通过利用条件状态中的额外信息,展现出更优的性能。通过物体收集游戏的对比实验,我们验证了该方法的有效性:通过可视化DA6-X中的注意力权重,证实智能体能够正确识别各类条件状态,成功学习情境依赖的协调行为,从而在提升智能体可解释性的同时获得了卓越性能。