We consider the task of generating segmentation masks for the target object from an object manipulation instruction, which allows users to give open vocabulary instructions to domestic service robots. Conventional segmentation generation approaches often fail to account for objects outside the camera's field of view and cases in which the order of vertices differs but still represents the same polygon, which leads to erroneous mask generation. In this study, we propose a novel method that generates segmentation masks from open vocabulary instructions. We implement a novel loss function using optimal transport to prevent significant loss where the order of vertices differs but still represents the same polygon. To evaluate our approach, we constructed a new dataset based on the REVERIE dataset and Matterport3D dataset. The results demonstrated the effectiveness of the proposed method compared with existing mask generation methods. Remarkably, our best model achieved a +16.32% improvement on the dataset compared with a representative polygon-based method.
翻译:本研究探讨从物体操作指令生成目标物体分割掩码的任务,该任务允许用户向家用服务机器人提供开放词汇指令。传统的分割生成方法通常未能考虑相机视野外的物体,以及顶点顺序不同但仍表示同一多边形的情况,这会导致错误的掩码生成。在本研究中,我们提出了一种从开放词汇指令生成分割掩码的新方法。我们利用最优传输实现了一种新颖的损失函数,以防止在顶点顺序不同但仍表示同一多边形时产生显著损失。为评估我们的方法,我们基于REVERIE数据集和Matterport3D数据集构建了一个新数据集。实验结果表明,与现有掩码生成方法相比,所提方法具有显著有效性。值得注意的是,与代表性的基于多边形的方法相比,我们的最佳模型在数据集上实现了+16.32%的性能提升。