Unified panoptic segmentation methods are achieving state-of-the-art results on several datasets. To achieve these results on high-resolution datasets, these methods apply crop-based training. In this work, we find that, although crop-based training is advantageous in general, it also has a harmful side-effect. Specifically, it limits the ability of unified networks to discriminate between large object instances, causing them to make predictions that are confused between multiple instances. To solve this, we propose Intra-Batch Supervision (IBS), which improves a network's ability to discriminate between instances by introducing additional supervision using multiple images from the same batch. We show that, with our IBS, we successfully address the confusion problem and consistently improve the performance of unified networks. For the high-resolution Cityscapes and Mapillary Vistas datasets, we achieve improvements of up to +2.5 on the Panoptic Quality for thing classes, and even more considerable gains of up to +5.8 on both the pixel accuracy and pixel precision, which we identify as better metrics to capture the confusion problem.
翻译:统一全景分割方法在多个数据集上取得了当前最佳结果。为了在高分辨率数据集上实现这些结果,这些方法采用了基于裁剪的训练。本研究发现,尽管基于裁剪的训练总体上具有优势,但它也存在有害的副作用。具体而言,它限制了统一网络区分大型对象实例的能力,导致网络在预测时混淆多个实例。为解决这一问题,我们提出了批次内监督(IBS)方法,通过利用同一批次中的多张图像引入额外监督,从而提高网络区分实例的能力。我们证明,采用IBS方法能够成功解决混淆问题,并持续提升统一网络的性能。对于高分辨率的Cityscapes和Mapillary Vistas数据集,我们在"物体"类的全景质量上实现了最高+2.5的提升,在像素精度和像素准确率上获得了更为显著的+5.8提升——我们认定这两项指标能更好地反映混淆问题。