In the field of robotics and automation, conventional object recognition and instance segmentation methods face a formidable challenge when it comes to perceiving Deformable Linear Objects (DLOs) like wires, cables, and flexible tubes. This challenge arises primarily from the lack of distinct attributes such as shape, color, and texture, which calls for tailored solutions to achieve precise identification. In this work, we propose a foundation model-based DLO instance segmentation technique that is text-promptable and user-friendly. Specifically, our approach combines the text-conditioned semantic segmentation capabilities of CLIPSeg model with the zero-shot generalization capabilities of Segment Anything Model (SAM). We show that our method exceeds SOTA performance on DLO instance segmentation, achieving a mIoU of $91.21\%$. We also introduce a rich and diverse DLO-specific dataset for instance segmentation.
翻译:在机器人与自动化领域,传统目标识别与实例分割方法在感知可变形线性物体(如电线、线缆及柔性软管)时面临严峻挑战。该挑战主要源于此类物体缺乏形状、颜色及纹理等显著属性,因而需要定制化解决方案以实现精确识别。本文提出了一种基于基础模型的可变形线性物体实例分割技术,该技术具备文本提示功能且用户友好。具体而言,我们将CLIPSeg模型的文本条件语义分割能力与Segment Anything Model(SAM)的零样本泛化能力相结合。实验证明,我们的方法在可变形线性物体实例分割任务上超越现有最优性能,平均交并比达$91.21\%$。同时,我们还引入了一个丰富且多样化的可变形线性物体专用实例分割数据集。