Foundation models hold significant potential for enabling robots to perform long-horizon general manipulation tasks. However, the simplicity of tasks and the uniformity of environments in existing benchmarks restrict their effective deployment in complex scenarios. To address this limitation, this paper introduces the \textit{RoboCAS} benchmark, the first benchmark specifically designed for complex object arrangement scenarios in robotic manipulation. This benchmark employs flexible and concise scripted policies to efficiently collect a diverse array of demonstrations, showcasing scattered, orderly, and stacked object arrangements within a highly realistic physical simulation environment. It includes complex processes such as target retrieval, obstacle clearance, and robot manipulation, testing agents' abilities to perform long-horizon planning for spatial reasoning and predicting chain reactions under ambiguous instructions. Extensive experiments on multiple baseline models reveal their limitations in managing complex object arrangement scenarios, underscoring the urgent need for intelligent agents capable of performing long-horizon operations in practical deployments and providing valuable insights for future research directions. Project website: \url{https://github.com/notFoundThisPerson/RoboCAS-v0}.
翻译:基础模型在赋能机器人执行长时程通用操作任务方面具有巨大潜力。然而,现有基准中任务的简单性与环境的单一性限制了其在复杂场景中的有效部署。为应对这一局限,本文提出了首个专为机器人操作中复杂物体排列场景设计的基准——\textit{RoboCAS}。该基准采用灵活、简洁的脚本策略,在高度逼真的物理仿真环境中高效收集了涵盖散乱、有序及堆叠等多种物体排列形态的演示数据。它包含了目标检索、障碍物清理及机器人操作等复杂流程,旨在测试智能体在模糊指令下进行空间推理的长时程规划能力以及对连锁反应的预测能力。在多个基线模型上的大量实验揭示了它们在处理复杂物体排列场景时的局限性,凸显了实际部署中对能够执行长时程操作的智能体的迫切需求,并为未来研究方向提供了有价值的洞见。项目网站:\url{https://github.com/notFoundThisPerson/RoboCAS-v0}。