Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm. An expert teacher provides advice to a student during training in order to improve the student's sample efficiency and policy performance. Such advice is commonly given in the form of state-action pairs. However, it makes it difficult for the student to reason with and apply to novel states. We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen. This allows the student to self-reflect on what it has learned, enabling advice generalization and leading to improved sample efficiency and learning performance - even in environments where the teacher is sub-optimal. We empirically show that our framework is effective in both single-agent and multi-agent scenarios, yielding improved policy returns and convergence rates when compared to state-of-the-art methods
翻译:动作建议是一种基于师生范式的强化学习知识迁移技术。专家教师会在训练过程中向学生提供建议,以提高学生的样本效率与策略性能,此类建议通常以状态-动作对的形式呈现。然而,这种形式使学生难以对建议进行推理并应用于新的状态。我们提出可解释的动作建议,其中教师不仅提供动作建议,还附带相关的解释,说明为何选择该动作。这使学生能够反思所学内容,实现建议的泛化,从而提升样本效率与学习性能——即便在教师非最优的环境中仍能取得良好效果。我们通过实验证明,该框架在单智能体和多智能体场景中均有效,与最先进方法相比,能获得更优的策略回报与收敛速度。