Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete(e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's ability to follow human instructions based on the grounding of actions and states. To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals. To promote language-instructed learning, we provide expert demonstrations with template-generated language descriptions. We assess task performance by utilizing the latest language-conditioned policy learning models. Our results indicate that current models for language-conditioned manipulations continue to experience significant challenges in novel goal-state generalizations, scene generalizations, and object generalizations. These findings highlight the need to develop new algorithms that address this gap and underscore the potential for further research in this area. See our project page at: https://arnold-benchmark.github.io
翻译:理解物体的连续状态对于现实世界中的任务学习与规划至关重要。然而,现有的大多数任务学习基准假设对象目标状态为离散状态(例如二进制),这给复杂任务的学习以及将所学策略从模拟环境迁移到现实世界带来了挑战。此外,状态离散化还限制了机器人基于动作与状态语义理解来遵循人类指令的能力。为解决上述问题,我们提出了ARNOLD——一个在真实3D场景中评估基于语言引导的连续状态任务学习的基准测试。ARNOLD包含8个语言条件化任务,这些任务涉及对象状态理解与连续目标策略学习。为促进语言引导式学习,我们提供了带有模板生成语言描述的专家示范。我们利用最新的语言条件化策略学习模型来评估任务性能。结果表明,当前用于语言条件化操纵的模型在新型目标状态泛化、场景泛化及物体泛化方面仍面临显著挑战。这些发现凸显了开发填补这一空白的新算法的必要性,并揭示了该领域进一步研究的潜力。项目页面详见:https://arnold-benchmark.github.io