Methods for object detection and segmentation often require abundant instance-level annotations for training, which are time-consuming and expensive to collect. To address this, the task of zero-shot object detection (or segmentation) aims at learning effective methods for identifying and localizing object instances for the categories that have no supervision available. Constructing architectures for these tasks requires choosing from a myriad of design options, ranging from the form of the class encoding used to transfer information from seen to unseen categories, to the nature of the function being optimized for learning. In this work, we extensively study these design choices, and carefully construct a simple yet extremely effective zero-shot recognition method. Through extensive experiments on the MSCOCO dataset on object detection and segmentation, we highlight that our proposed method outperforms existing, considerably more complex, architectures. Our findings and method, which we propose as a competitive future baseline, point towards the need to revisit some of the recent design trends in zero-shot detection / segmentation.
翻译:目标检测与分割方法通常需要大量实例级标注数据进行训练,这些数据的收集既耗时又昂贵。为解决这一问题,零样本目标检测(或分割)任务旨在学习有效方法,以识别和定位那些没有监督信息的类别中的目标实例。构建这些任务的架构需要从众多设计选项中进行选择,范围从用于将信息从可见类别传递到不可见类别的类别编码形式,到为学习而优化的函数性质。在本研究中,我们深入探讨了这些设计选择,并精心构建了一种简单却极其有效的零样本识别方法。通过在MSCOCO数据集上进行目标检测与分割的广泛实验,我们强调所提出的方法优于现有更加复杂的架构。我们的发现及其方法(我们将其作为未来具有竞争力的基线提出)表明,有必要重新审视零样本检测/分割中的一些近期设计趋势。