Incorporating language comprehension into robotic operations unlocks significant advancements in robotics, but also presents distinct challenges, particularly in executing spatially oriented tasks like pattern formation. This paper introduces ZeroCAP, a novel system that integrates large language models with multi-robot systems for zero-shot context aware pattern formation. Grounded in the principles of language-conditioned robotics, ZeroCAP leverages the interpretative power of language models to translate natural language instructions into actionable robotic configurations. This approach combines the synergy of vision-language models, cutting-edge segmentation techniques and shape descriptors, enabling the realization of complex, context-driven pattern formations in the realm of multi robot coordination. Through extensive experiments, we demonstrate the systems proficiency in executing complex context aware pattern formations across a spectrum of tasks, from surrounding and caging objects to infilling regions. This not only validates the system's capability to interpret and implement intricate context-driven tasks but also underscores its adaptability and effectiveness across varied environments and scenarios. More details about this work are available at: https://sites.google.com/view/zerocap/home
翻译:将语言理解融入机器人操作虽能显著推动机器人技术发展,但也带来了独特挑战,尤其是在执行空间定向任务(如模式形成)时。本文提出ZeroCAP,一种集成大语言模型与多机器人系统的新型框架,用于实现零样本情境感知模式形成。基于语言条件机器人技术原理,ZeroCAP利用语言模型的解释能力将自然语言指令转化为可执行的机器人配置。该方法结合视觉语言模型、前沿分割技术与形状描述符的协同作用,在多机器人协调领域实现了复杂情境驱动模式的形成。通过大量实验,我们验证了该系统在从包围/囚禁物体到区域填充等一系列任务中执行复杂情境感知模式形成的能力。这不仅证明了系统解释与实现复杂情境驱动任务的性能,更凸显了其在不同环境与场景下的适应性与有效性。更多研究详情请见:https://sites.google.com/view/zerocap/home