Language-conditioned robot behavior plays a vital role in executing complex tasks by associating human commands or instructions with perception and actions. The ability to compose long-horizon tasks based on unconstrained language instructions necessitates the acquisition of a diverse set of general-purpose skills. However, acquiring inherent primitive skills in a coupled and long-horizon environment without external rewards or human supervision presents significant challenges. In this paper, we evaluate the relationship between skills and language instructions from a mathematical perspective, employing two forms of mutual information within the framework of language-conditioned policy learning. To maximize the mutual information between language and skills in an unsupervised manner, we propose an end-to-end imitation learning approach known as Language Conditioned Skill Discovery (LCSD). Specifically, we utilize vector quantization to learn discrete latent skills and leverage skill sequences of trajectories to reconstruct high-level semantic instructions. Through extensive experiments on language-conditioned robotic navigation and manipulation tasks, encompassing BabyAI, LORel, and CALVIN, we demonstrate the superiority of our method over prior works. Our approach exhibits enhanced generalization capabilities towards unseen tasks, improved skill interpretability, and notably higher rates of task completion success.
翻译:语言条件化机器人行为通过将人类指令或指导与感知及动作相关联,在复杂任务执行中发挥着关键作用。基于无约束语言指令组合长视野任务的能力,要求获取多样化通用技能。然而,在缺乏外部奖励或人类监督的耦合长视野环境中获取内在原始技能面临重大挑战。本文从数学角度评估技能与语言指令之间的关系,在语言条件化策略学习框架中采用两种互信息形式。为以无监督方式最大化语言与技能之间的互信息,我们提出一种端到端模仿学习方法——语言条件化技能发现(LCSD)。具体而言,利用向量量化学习离散潜在技能,并借助轨迹的技能序列重构高层语义指令。通过在BabyAI、LORel和CALVIN基准上的语言条件化机器人导航与操作任务的大量实验,我们证明了该方法相较于先前工作的优越性。本方法在未见任务上展现出更强的泛化能力、更优的技能可解释性,以及显著更高的任务完成成功率。