Do deep learning models for instance segmentation generalize to novel objects in a systematic way? For classification, such behavior has been questioned. In this study, we aim to understand if certain design decisions such as framework, architecture or pre-training contribute to the semantic understanding of instance segmentation. To answer this question, we consider a special case of robustness and compare pre-trained models on a challenging benchmark for object-centric, out-of-distribution texture. We do not introduce another method in this work. Instead, we take a step back and evaluate a broad range of existing literature. This includes Cascade and Mask R-CNN, Swin Transformer, BMask, YOLACT(++), DETR, BCNet, SOTR and SOLOv2. We find that YOLACT++, SOTR and SOLOv2 are significantly more robust to out-of-distribution texture than other frameworks. In addition, we show that deeper and dynamic architectures improve robustness whereas training schedules, data augmentation and pre-training have only a minor impact. In summary we evaluate 68 models on 61 versions of MS COCO for a total of 4148 evaluations.
翻译:深度实例分割模型能否系统性泛化到新物体?对于分类任务,此类行为已受到质疑。本研究旨在探究框架、架构或预训练等设计选择是否有助于实例分割的语义理解。为解答此问题,我们聚焦鲁棒性的特例,在面向物体、分布外纹理的挑战性基准上比较预训练模型。本文未提出新方法,而是退一步评估现有文献中的广泛范围,包括Cascade和Mask R-CNN、Swin Transformer、BMask、YOLACT(++)、DETR、BCNet、SOTR和SOLOv2。我们发现YOLACT++、SOTR和SOLOv2对分布外纹理的鲁棒性显著优于其他框架。此外,我们表明更深的动态架构可提升鲁棒性,而训练计划、数据增强和预训练仅产生次要影响。总结而言,我们在61个版本的MS COCO上评估了68个模型,共进行4148次评估。