Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present RoboCasa365, a comprehensive simulation benchmark for household mobile manipulation. Built on the RoboCasa platform, RoboCasa365 introduces 365 everyday tasks across 2,500 diverse kitchen environments, with over 600 hours of human demonstration data and over 1600 hours of synthetically generated demonstration data -- making it one of the most diverse and large-scale resources for studying generalist policies. RoboCasa365 is designed to support systematic evaluations for different problem settings, including multi-task learning, robot foundation model training, and lifelong learning. We conduct extensive experiments on this benchmark with state-of-the-art methods and analyze the impacts of task diversity, dataset scale, and environment variation on generalization. Our results provide new insights into what factors most strongly affect the performance of generalist robots and inform strategies for future progress in the field.
翻译:机器人学习的最新进展加速了通用机器人的发展,使其能够在人类环境中执行日常任务。然而,评估我们距离这一愿景还有多远仍然困难。该领域缺乏一个可复现、大规模的系统性评估基准。为填补这一空白,我们提出了RoboCasa365,一个面向家庭移动操作任务的综合性仿真基准。RoboCasa365基于RoboCasa平台构建,引入了涵盖2,500个多样化厨房环境的365项日常任务,并提供了超过600小时的人类演示数据和超过1600小时的合成生成演示数据——使其成为研究通用策略最多样化、最大规模的资源之一。RoboCasa365旨在支持针对不同问题设置的系统性评估,包括多任务学习、机器人基础模型训练和终身学习。我们在此基准上使用最先进的方法进行了大量实验,并分析了任务多样性、数据集规模和环境变化对泛化能力的影响。我们的结果为哪些因素最强烈地影响通用机器人的性能提供了新的见解,并为该领域未来的进展策略提供了参考。