Learning from Demonstration (LfD) can be an efficient way to train systems with analogous agents by enabling ``Student'' agents to learn from the demonstrations of the most experienced ``Teacher'' agent, instead of training their policy in parallel. However, when there are discrepancies in agent capabilities, such as divergent actuator power or joint angle constraints, naively replicating demonstrations that are out of bounds for the Student's capability can limit efficient learning. We present a Teacher-Student learning framework specifically tailored to address the challenge of heterogeneity between the Teacher and Student agents. Our framework is based on the concept of ``surprise'', inspired by its application in exploration incentivization in sparse-reward environments. Surprise is repurposed to enable the Teacher to detect and adapt to differences between itself and the Student. By focusing on maximizing its surprise in response to the environment while concurrently minimizing the Student's surprise in response to the demonstrations, the Teacher agent can effectively tailor its demonstrations to the Student's specific capabilities and constraints. We validate our method by demonstrating improvements in the Student's learning in control tasks within sparse-reward environments.
翻译:从演示中学习(LfD)是一种高效训练相似智能体系统的方法,它使“学生”智能体能够从经验最丰富的“教师”智能体的演示中学习,而非并行训练其策略。然而,当智能体能力存在差异时,例如驱动器功率不同或关节角度约束各异,简单复制超出学生能力范围的演示会限制高效学习。我们提出了一种专门针对教师与学生智能体间异构性挑战的师生学习框架。该框架基于“惊喜”概念,其灵感来源于其在稀疏奖励环境中激励探索的应用。我们重新利用惊喜概念,使教师能够检测并适应其与学生之间的差异。通过专注于最大化自身对环境响应的惊喜,同时最小化学生对演示响应的惊喜,教师智能体能够有效地根据学生的具体能力和约束定制其演示。我们通过在稀疏奖励环境中的控制任务上展示学生学习的改进,验证了所提方法的有效性。