In large-scale storehouses, precise instance masks are crucial for robotic bin picking but are challenging to obtain. Existing instance segmentation methods typically rely on a tedious process of scene collection, mask annotation, and network fine-tuning for every single Stock Keeping Unit (SKU). This paper presents SKU-Patch, a new patch-guided instance segmentation solution, leveraging only a few image patches for each incoming new SKU to predict accurate and robust masks, without tedious manual effort and model re-training. Technical-wise, we design a novel transformer-based network with (i) a patch-image correlation encoder to capture multi-level image features calibrated by patch information and (ii) a patch-aware transformer decoder with parallel task heads to generate instance masks. Extensive experiments on four storehouse benchmarks manifest that SKU-Patch is able to achieve the best performance over the state-of-the-art methods. Also, SKU-Patch yields an average of nearly 100% grasping success rate on more than 50 unseen SKUs in a robot-aided auto-store logistic pipeline, showing its effectiveness and practicality.
翻译:在大型仓储环境中,精确的实例掩膜对机器人货箱拣选至关重要,但获取难度极高。现有实例分割方法通常依赖繁琐的场景采集、掩膜标注及针对每个库存量单位(SKU)的网络微调流程。本文提出SKU-Patch——一种新型补丁引导的实例分割方案,仅需为每个新增SKU提供少量图像补丁即可预测精确鲁棒的掩膜,无需繁琐的人工操作与模型重训练。在技术层面,我们设计了基于Transformer的创新网络架构,包含:(i) 补丁-图像关联编码器,用于捕获经补丁信息校准的多层级图像特征;(ii) 配备并行任务头的补丁感知Transformer解码器,用于生成实例掩膜。在四个仓储基准数据集上的大量实验表明,SKU-Patch能够超越现有最佳方法实现最优性能。此外,在机器人辅助自动仓储物流流水线中,SKU-Patch对超过50个未知SKU实现了近100%的平均抓取成功率,充分验证了其有效性与实用性。