Multi-agent robotic systems are increasingly operating in real-world environments in close proximity to humans, yet are largely controlled by policy models with inscrutable deep neural network representations. We introduce a method for incorporating interpretable concepts from a domain expert into models trained through multi-agent reinforcement learning, by requiring the model to first predict such concepts then utilize them for decision making. This allows an expert to both reason about the resulting concept policy models in terms of these high-level concepts at run-time, as well as intervene and correct mispredictions to improve performance. We show that this yields improved interpretability and training stability, with benefits to policy performance and sample efficiency in a simulated and real-world cooperative-competitive multi-agent game.
翻译:多智能体机器人系统正日益在人类近距离的真实环境中运行,但这类系统大多由具有难以解读的深度神经网络表征的策略模型所控制。我们引入了一种方法,通过要求模型首先预测领域专家定义的可解释概念,继而利用这些概念进行决策,将领域专家的可解释概念融入通过多智能体强化学习训练的模型中。这使得专家既能在运行时基于这些高层概念对由此产生的概念策略模型进行推理,也能通过干预和纠正错误预测来提升性能。我们证明,该方法在模拟和真实世界中的合作-竞争多智能体博弈中,能够提高可解释性与训练稳定性,同时对策略性能和样本效率产生积极影响。