Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize the Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilize foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting. This methodology enables a straightforward matching strategy, resulting in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets. In instance segmentation tasks on seven core datasets of the BOP challenge, our method outperforms the top RGB methods by 3.6 AP and remains competitive with the best RGB-D method. Code is available at: https://github.com/YoungSean/NIDS-Net
翻译:新颖实例检测与分割(NIDS)旨在给定每个实例的少量示例情况下,检测并分割新颖物体实例。我们提出了一个统一框架(NIDS-Net),包含物体候选区域生成、实例模板与候选区域的特征嵌入构建,以及通过嵌入匹配实现实例标签分配。借助大规模视觉方法的最新进展,我们利用Grounding DINO和Segment Anything Model(SAM)获取具有精确边界框和掩码的物体候选区域。本方法的核心在于生成高质量的实例嵌入特征。我们采用DINOv2 ViT主干网络提取的补丁嵌入特征的前景平均值,并通过我们提出的权重适配器机制进行精细化处理。实验表明,我们的权重适配器能够在特征空间内局部调整嵌入表示,有效抑制过拟合现象。该方法实现了简洁高效的匹配策略,从而获得显著的性能提升。我们的框架在四个检测数据集上平均精度(AP)分别提升22.3、46.2、10.3和24.0,超越了当前最先进方法。在BOP挑战赛七个核心数据集的实例分割任务中,本方法以3.6 AP的优势优于顶级RGB方法,并与最佳RGB-D方法保持竞争力。代码已开源:https://github.com/YoungSean/NIDS-Net