We present MOSAIC, a modular architecture for home robots to perform complex collaborative tasks, such as cooking with everyday users. MOSAIC tightly collaborates with humans, interacts with users using natural language, coordinates multiple robots, and manages an open vocabulary of everyday objects. At its core, MOSAIC employs modularity: it leverages multiple large-scale pre-trained models for general tasks like language and image recognition, while using streamlined modules designed for task-specific control. We extensively evaluate MOSAIC on 60 end-to-end trials where two robots collaborate with a human user to cook a combination of 6 recipes. We also extensively test individual modules with 180 episodes of visuomotor picking, 60 episodes of human motion forecasting, and 46 online user evaluations of the task planner. We show that MOSAIC is able to efficiently collaborate with humans by running the overall system end-to-end with a real human user, completing 68.3% (41/60) collaborative cooking trials of 6 different recipes with a subtask completion rate of 91.6%. Finally, we discuss the limitations of the current system and exciting open challenges in this domain. The project's website is at https://portal-cornell.github.io/MOSAIC/
翻译:我们提出MOSAIC,一种面向家用机器人执行复杂协作任务(例如与日常用户共同烹饪)的模块化架构。MOSAIC能够与人类紧密协作,通过自然语言与用户交互,协调多台机器人,并管理日常物品的开放词汇表。其核心采用模块化设计:利用多个大规模预训练模型处理语言和图像识别等通用任务,同时采用专为特定任务控制设计的精简模块。我们通过60次端到端试验对MOSAIC进行全面评估——两台机器人与一名人类用户协作完成6种食谱的组合烹饪。此外,我们对单个模块进行了广泛测试:180轮视觉运动抓取、60轮人体运动预测,以及46次任务规划器的在线用户评估。结果表明,MOSAIC能通过真实人类用户的端到端系统运行实现高效人机协作,在6种不同食谱的协作烹饪试验中成功完成68.3%(41/60),子任务完成率达91.6%。最后,我们讨论了当前系统的局限性及该领域极具挑战性的开放性问题。项目网站:https://portal-cornell.github.io/MOSAIC/