Enabling robots to autonomously perform hybrid motions in diverse environments can be beneficial for long-horizon tasks such as material handling, household chores, and work assistance. This requires extensive exploitation of intrinsic motion capabilities, extraction of affordances from rich environmental information, and planning of physical interaction behaviors. Despite recent progress has demonstrated impressive humanoid whole-body control abilities, they struggle to achieve versatility and adaptability for new tasks. In this work, we propose HYPERmotion, a framework that learns, selects and plans behaviors based on tasks in different scenarios. We combine reinforcement learning with whole-body optimization to generate motion for 38 actuated joints and create a motion library to store the learned skills. We apply the planning and reasoning features of the large language models (LLMs) to complex loco-manipulation tasks, constructing a hierarchical task graph that comprises a series of primitive behaviors to bridge lower-level execution with higher-level planning. By leveraging the interaction of distilled spatial geometry and 2D observation with a visual language model (VLM) to ground knowledge into a robotic morphology selector to choose appropriate actions in single- or dual-arm, legged or wheeled locomotion. Experiments in simulation and real-world show that learned motions can efficiently adapt to new tasks, demonstrating high autonomy from free-text commands in unstructured scenes. Videos and website: hy-motion.github.io/
翻译:使机器人能够在多样化环境中自主执行混合运动,对于物料搬运、家庭杂务和工作协助等长时程任务具有重要价值。这需要充分挖掘机器人内在运动能力、从丰富的环境信息中提取可供性,并规划物理交互行为。尽管近期研究展示了令人印象深刻的人形机器人全身控制能力,但其在新任务中仍难以实现多功能性与适应性。本文提出HYPERmotion框架,该系统能够根据不同场景的任务需求学习、选择并规划行为。我们将强化学习与全身优化相结合,为38个驱动关节生成运动轨迹,并建立运动库存储习得技能。通过运用大语言模型(LLMs)的规划与推理能力处理复杂移动操作任务,构建包含系列基础行为的分层任务图谱,从而衔接底层执行与高层规划。借助蒸馏空间几何信息与二维观测数据,结合视觉语言模型(VLM)进行交互,将知识嵌入机器人形态选择器,以适配单臂/双臂、腿式/轮式运动模式下的动作决策。仿真与实物实验表明,习得的运动策略能高效适应新任务,在非结构化场景中仅需自由文本指令即可实现高度自主运行。视频与网站:hy-motion.github.io/