Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete (e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's ability to follow human instructions based on the grounding of actions and states. To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals. To promote language-instructed learning, we provide expert demonstrations with template-generated language descriptions. We assess task performance by utilizing the latest language-conditioned policy learning models. Our results indicate that current models for language-conditioned manipulations continue to experience significant challenges in novel goal-state generalizations, scene generalizations, and object generalizations. These findings highlight the need to develop new algorithms that address this gap and underscore the potential for further research in this area. Project website: https://arnold-benchmark.github.io.
翻译:理解物体的连续状态对于现实世界中的任务学习与规划至关重要。然而,现有的大多数任务学习基准假设物体目标状态是离散的(例如二元状态),这给复杂任务的学习以及将习得策略从仿真环境迁移到真实世界带来了挑战。此外,状态离散化限制了机器人基于动作与状态理解执行人类指令的能力。为解决这些问题,我们提出ARNOLD,这是一个在真实3D场景中评估连续状态下语言引导任务学习的基准。ARNOLD包含8个语言条件任务,涉及理解物体状态并学习面向连续目标的策略。为促进语言指令学习,我们提供了基于模板生成语言描述的专家演示。我们通过使用最新的语言条件策略学习模型来评估任务性能。结果表明,当前的语言条件操作模型在新颖目标状态泛化、场景泛化和物体泛化方面仍面临显著挑战。这些发现凸显了开发新算法以弥合这一差距的必要性,并揭示了该领域进一步研究的潜力。项目网站:https://arnold-benchmark.github.io。