Cooking tasks are characterized by large changes in the state of the food, which is one of the major challenges in robot execution of cooking tasks. In particular, cooking using a stove to apply heat to the foodstuff causes many special state changes that are not seen in other tasks, making it difficult to design a recognizer. In this study, we propose a unified method for recognizing changes in the cooking state of robots by using the vision-language model that can discriminate open-vocabulary objects in a time-series manner. We collected data on four typical state changes in cooking using a real robot and confirmed the effectiveness of the proposed method. We also compared the conditions and discussed the types of natural language prompts and the image regions that are suitable for recognizing the state changes.
翻译:烹饪任务的特点是食品状态的剧烈变化,这是机器人执行烹饪任务的主要挑战之一。特别是使用炉灶对食材进行加热的烹饪过程,会产生许多在其他任务中不常见的特殊状态变化,这使得识别器设计变得困难。本研究提出了一种统一方法,通过使用能够以时序方式区分开放词汇对象的视觉语言模型,来识别机器人烹饪过程中的状态变化。我们利用真实机器人收集了烹饪中四种典型状态变化的数据,验证了所提出方法的有效性。我们还对比了不同条件,探讨了适用于状态变化识别的自然语言提示类型和图像区域。