Tiny object detection has become an active area of research because images with tiny targets are common in several important real-world scenarios. However, existing tiny object detection methods use standard deep neural networks as their backbone architecture. We argue that such backbones are inappropriate for detecting tiny objects as they are designed for the classification of larger objects, and do not have the spatial resolution to identify small targets. Specifically, such backbones use max-pooling or a large stride at early stages in the architecture. This produces lower resolution feature-maps that can be efficiently processed by subsequent layers. However, such low-resolution feature-maps do not contain information that can reliably discriminate tiny objects. To solve this problem we design 'bottom-heavy' versions of backbones that allocate more resources to processing higher-resolution features without introducing any additional computational burden overall. We also investigate if pre-training these backbones on images of appropriate size, using CIFAR100 and ImageNet32, can further improve performance on tiny object detection. Results on TinyPerson and WiderFace show that detectors with our proposed backbones achieve better results than the current state-of-the-art methods.
翻译:微小目标检测已成为一个活跃的研究领域,因为含有微小目标的图像在多个重要的现实场景中十分常见。然而,现有的微小目标检测方法使用标准的深度神经网络作为其骨干网络架构。我们认为,这类骨干网络并不适合检测微小目标,因为它们是为较大目标的分类而设计的,且不具备识别小目标所需的空间分辨率。具体而言,这类骨干网络在架构的早期阶段使用了最大池化或大步长。这会产生分辨率较低的特征图,便于后续层高效处理。然而,如此低分辨率的特征图并不包含能可靠区分微小目标的信息。为解决这一问题,我们设计了“底部偏重”的骨干网络版本,在不增加整体计算负担的前提下,分配更多资源用于处理高分辨率特征。我们还研究了使用合适尺寸的图像(如CIFAR100和ImageNet32)对这类骨干网络进行预训练,是否能够进一步提升微小目标检测的性能。在TinyPerson和WiderFace上的实验结果表明,采用我们提出的骨干网络的检测器比当前最先进的方法取得了更好的结果。