Recently, the advent of pre-trained large-scale language models (LLMs) like ChatGPT and GPT-4 have significantly advanced the machine's natural language understanding capabilities. This breakthrough has allowed us to seamlessly integrate these open-source LLMs into a unified robot simulator environment to help robots accurately understand and execute human natural language instructions. To this end, in this work, we introduce a realistic robotic manipulation simulator and build a Robotic Manipulation with Progressive Reasoning Tasks (RM-PRT) benchmark on this basis. Specifically, the RM-PRT benchmark builds a new high-fidelity digital twin scene based on Unreal Engine 5, which includes 782 categories, 2023 objects, and 15K natural language instructions generated by ChatGPT for a detailed evaluation of robot manipulation. We propose a general pipeline for the RM-PRT benchmark that takes as input multimodal prompts containing natural language instructions and automatically outputs actions containing the movement and position transitions. We set four natural language understanding tasks with progressive reasoning levels and evaluate the robot's ability to understand natural language instructions in two modes of adsorption and grasping. In addition, we also conduct a comprehensive analysis and comparison of the differences and advantages of 10 different LLMs in instruction understanding and generation quality. We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation. Project website: https://necolizer.github.io/RM-PRT/ .
翻译:近期,以ChatGPT和GPT-4为代表的预训练大规模语言模型的出现,显著提升了机器的自然语言理解能力。这一突破使我们能够将这些开源语言模型无缝集成到统一的机器人仿真环境中,帮助机器人准确理解并执行人类自然语言指令。为此,本文介绍了一个真实感的机器人操作仿真器,并在此基础上构建了面向渐进式推理任务的机器人操作基准(RM-PRT)。具体而言,RM-PRT基准基于虚幻引擎5构建了全新的高保真数字孪生场景,包含782个类别、2023个物体,以及由ChatGPT生成的15000条自然语言指令,用于对机器人操作任务进行精细评估。我们提出了RM-PRT基准的通用处理流水线,该流水线以包含自然语言指令的多模态提示为输入,自动输出包含移动和位置变化的动作。我们设置了四个具有渐进式推理层次的自然语言理解任务,并在吸附和抓取两种模式下评估机器人理解自然语言指令的能力。此外,我们还对10种不同语言模型在指令理解与生成质量方面的差异和优势进行了全面分析与比较。我们期望这一新型仿真器与基准能够推动语言引导机器人操作领域的未来研究。项目网站:https://necolizer.github.io/RM-PRT/。