We present MOSAIC, a modular architecture for home robots to perform complex collaborative tasks, such as cooking with everyday users. MOSAIC tightly collaborates with humans, interacts with users using natural language, coordinates multiple robots, and manages an open vocabulary of everyday objects. At its core, MOSAIC employs modularity: it leverages multiple large-scale pre-trained models for general tasks like language and image recognition, while using streamlined modules designed for task-specific control. We extensively evaluate MOSAIC on 60 end-to-end trials where two robots collaborate with a human user to cook a combination of 6 recipes. We also extensively test individual modules with 180 episodes of visuomotor picking, 60 episodes of human motion forecasting, and 46 online user evaluations of the task planner. We show that MOSAIC is able to efficiently collaborate with humans by running the overall system end-to-end with a real human user, completing 68.3% (41/60) collaborative cooking trials of 6 different recipes with a subtask completion rate of 91.6%. Finally, we discuss the limitations of the current system and exciting open challenges in this domain. The project's website is at https://portal-cornell.github.io/MOSAIC/
翻译:本文提出MOSAIC,一种面向家庭机器人的模块化架构,用于执行复杂的协作任务(例如与日常用户共同烹饪)。MOSAIC能够与人类紧密协作,通过自然语言与用户交互,协调多个机器人,并管理日常物品的开放词汇库。其核心在于模块化设计:系统利用多个大规模预训练模型处理语言和图像识别等通用任务,同时采用为特定任务控制设计的轻量化模块。我们在60次端到端实验中全面评估了MOSAIC,其中两个机器人协同人类用户完成6种食谱的组合烹饪。我们还对各独立模块进行了广泛测试,包括180次视觉运动抓取实验、60次人体运动预测实验,以及任务规划器的46次在线用户评估。实验表明,MOSAIC能够通过端到端系统与真实用户高效协作,在6种不同食谱的协作烹饪任务中达成68.3%(41/60)的整体成功率,子任务完成率达91.6%。最后,我们讨论了当前系统的局限性以及该领域亟待突破的挑战。项目网站位于https://portal-cornell.github.io/MOSAIC/