Explaining the behavior of intelligent agents such as robots to humans is challenging due to their incomprehensible proprioceptive states, variational intermediate goals, and resultant unpredictability. Moreover, one-step explanations for reinforcement learning agents can be ambiguous as they fail to account for the agent's future behavior at each transition, adding to the complexity of explaining robot actions. By leveraging abstracted actions that map to task-specific primitives, we avoid explanations on the movement level. Our proposed framework combines reward decomposition (RD) with abstracted action spaces into an explainable learning framework, allowing for non-ambiguous and high-level explanations based on object properties in the task. We demonstrate the effectiveness of our framework through quantitative and qualitative analysis of two robot scenarios, showcasing visual and textual explanations, from output artifacts of RD explanation, that are easy for humans to comprehend. Additionally, we demonstrate the versatility of integrating these artifacts with large language models for reasoning and interactive querying.
翻译:解释智能体(如机器人)对人类的行为颇具挑战,原因在于其难以理解的体感状态、变化的中间目标以及由此产生的不确定性。此外,针对强化学习智能体的单步解释可能模棱两可,因为它未能考虑智能体在每次转移中的未来行为,这进一步增加了解释机器人动作的复杂性。通过利用映射到任务特定原语的抽象动作,我们避免了在运动层面的解释。我们提出的框架将奖励分解与抽象动作空间结合成一个可解释的学习框架,使得基于任务中对象属性生成无歧义且高层次的解释成为可能。我们通过对两个机器人场景的定量和定性分析,展示了该框架的有效性,并从奖励分解解释的输出产物中呈现了易于人类理解的视觉和文本解释。此外,我们展示了将这些产物与大型语言模型集成以进行推理和交互式查询的通用性。