Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models that enable robotic control. The main objective of these methods is to develop a generalist policy that can control robots with various embodiments. However, in industrial robotic applications such as automated assembly and disassembly, some tasks, such as insertion, demand greater accuracy and involve intricate factors like contact engagement, friction handling, and refined motor skills. Implementing these skills using a generalist policy is challenging because these policies might integrate further sensory data, including force or torque measurements, for enhanced precision. In our method, we present a global control policy based on LLMs that can transfer the control policy to a finite set of skills that are specifically trained to perform high-precision tasks through dynamic context switching. The integration of LLMs into this framework underscores their significance in not only interpreting and processing language inputs but also in enriching the control mechanisms for diverse and intricate robotic operations.
翻译:大型语言模型(LLMs)与强大的视觉模型推动了视觉-语言-动作模型领域的快速研究与发展,这些模型能够实现机器人控制。这些方法的主要目标是开发一种通用策略,以控制具有不同形态的机器人。然而,在工业机器人应用(如自动化装配与拆卸)中,某些任务(例如插入操作)需要更高的精度,并涉及接触配合、摩擦力处理及精细运动技能等复杂因素。使用通用策略实现这些技能具有挑战性,因为这些策略可能需要集成更多传感器数据(包括力或扭矩测量)以提高精度。在我们的方法中,我们提出了一种基于LLMs的全局控制策略,该策略能够通过动态上下文切换,将控制策略转移至一组有限且经过专门训练以执行高精度任务的技能。将LLMs集成到该框架中,突显了其不仅在解释和处理语言输入方面的重要性,还在丰富针对多样且复杂机器人操作的控制机制方面发挥着关键作用。