In this work, we aim to enable legged robots to learn how to interpret human social cues and produce appropriate behaviors through physical human guidance. However, learning through physical engagement can place a heavy burden on users when the process requires large amounts of human-provided data. To address this, we propose a human-in-the-loop framework that enables robots to acquire navigational behaviors in a data-efficient manner and to be controlled via multimodal natural human inputs, specifically gestural and verbal commands. We reconstruct interaction scenes using a physics-based simulation and aggregate data to mitigate distributional shifts arising from limited demonstration data. Our progressive goal cueing strategy adaptively feeds appropriate commands and navigation goals during training, leading to more accurate navigation and stronger alignment between human input and robot behavior. We evaluate our framework across six real-world agile navigation scenarios, including jumping over or avoiding obstacles. Our experimental results show that our proposed method succeeds in almost all trials across these scenarios, achieving a 97.15% task success rate with less than 1 hour of demonstration data in total.
翻译:本研究旨在使足式机器人能够学习如何解读人类社交线索,并通过物理性人工引导产生相应行为。然而,当学习过程需要大量人工提供的数据时,通过物理交互进行学习会给用户带来沉重负担。为解决这一问题,我们提出一种人在回路框架,使机器人能够以数据高效的方式获取导航行为,并通过多模态自然人类输入(特别是手势与语音指令)进行控制。我们基于物理仿真重建交互场景,并通过数据聚合缓解因示范数据有限导致的分布偏移问题。我们提出的渐进式目标提示策略能在训练过程中自适应地提供恰当指令与导航目标,从而实现更精确的导航并强化人类输入与机器人行为之间的对齐关系。我们在六种现实世界的敏捷导航场景(包括跨越或避开障碍物)中对本框架进行评估。实验结果表明,所提方法在全部场景的测试中几乎均获成功,在总计不足1小时的示范数据条件下实现了97.15%的任务成功率。