Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.
翻译:基于演示的学习方法使终端用户能够通过演示期望行为来教授机器人新任务,从而推动了机器人技术的普及。然而,当前的基于演示的学习框架既无法快速适应异构的人类演示,也无法在普适机器人应用中实现大规模部署。本文提出一种新颖的基于演示的学习框架——快速终身自适应逆强化学习(FLAIR)。我们的方法:(1) 利用已习得策略构建策略混合体,以快速适应新演示,实现终端用户的快速个性化;(2) 跨演示蒸馏共性知识,实现准确的任务推断;(3) 仅在终身部署需要时扩展模型,维护一组简洁的原型策略,这些策略可通过策略混合近似所有行为。实验验证表明,FLAIR具有适应性(即机器人适应异构的用户特定任务偏好)、高效性(即机器人实现样本高效的适应)和可扩展性(即模型规模随演示数量呈亚线性增长,同时保持高性能)。在三个控制任务中,FLAIR超越基准方法,策略回报平均提升57%,基于策略混合的演示建模所需回合数平均减少78%。最后,我们在乒乓球任务中验证了FLAIR的成功,用户评价FLAIR在任务表现(p<.05)和个性化表现(p<.05)上均具有更优性能。