In social robotics, a pivotal focus is enabling robots to engage with humans in a more natural and seamless manner. The emergence of advanced large language models (LLMs) such as Generative Pre-trained Transformers (GPTs) and autoregressive models like Large Language Model Meta AI (Llamas) has driven significant advancements in integrating natural language understanding capabilities into social robots. This paper presents a system for speech-guided sequential planning in autonomous navigation, utilizing Llama3 and the Robot Operating System~(ROS). The proposed system involves using Llama3 to interpret voice commands, extracting essential details through parsing, and decoding these commands into sequential actions for tasks. Such sequential planning is essential in various domains, particularly in the pickup and delivery of an object. Once a sequential navigation task is evaluated, we employ DRL-VO, a learning-based control policy that allows a robot to autonomously navigate through social spaces with static infrastructure and (crowds of) people. We demonstrate the effectiveness of the system in simulation experiment using Turtlebot 2 in ROS1 and Turtlebot 3 in ROS2. We conduct hardware trials using a Clearpath Robotics Jackal UGV, highlighting its potential for real-world deployment in scenarios requiring flexible and interactive robotic behaviors.
翻译:在社会机器人学中,一个核心焦点是使机器人能够以更自然、无缝的方式与人类互动。先进大型语言模型(LLMs)的出现,例如生成式预训练Transformer(GPTs)以及像Meta AI大型语言模型(Llamas)这样的自回归模型,极大地推动了自然语言理解能力与社会机器人的融合。本文提出了一种利用Llama3和机器人操作系统(ROS)实现语音引导序列规划的自主导航系统。该系统使用Llama3解析语音指令,通过解析提取关键信息,并将这些指令解码为任务序列动作。此类序列规划在多个领域至关重要,特别是在物体的拾取与递送任务中。一旦序列导航任务被评估确定,我们采用DRL-VO——一种基于学习的控制策略,使机器人能够在包含静态基础设施和(人群)动态行人的社会空间中自主导航。我们在ROS1环境下的Turtlebot 2和ROS2环境下的Turtlebot 3上通过仿真实验验证了系统的有效性。同时,我们使用Clearpath Robotics Jackal UGV进行了硬件试验,结果表明该系统在需要灵活交互机器人行为的实际应用场景中具有部署潜力。