Achieving constant accuracy in object detection is challenging due to the inherent variability of object sizes. One effective approach to this problem involves optimizing input resolution, referred to as a multi-resolution strategy. Previous approaches to resolution optimization have often been based on pre-defined resolutions with manual selection. However, there is a lack of study on run-time resolution optimization for existing architectures. This paper introduces DyRA, a dynamic resolution adjustment network providing an image-specific scale factor for existing detectors. This network is co-trained with detectors utilizing specially designed loss functions, namely ParetoScaleLoss and BalanceLoss. ParetoScaleLoss determines an adaptive scale factor for robustness, while BalanceLoss optimizes overall scale factors according to the localization performance of the detector. The loss function is devised to minimize the accuracy drop across contrasting objectives of different-sized objects for scaling. Our proposed network can improve accuracy across various models, including RetinaNet, Faster-RCNN, FCOS, DINO, and H-Deformable-DETR. The code is available at https://github.com/DaEunFullGrace/DyRA.git.
翻译:物体检测中因目标尺寸的自然变化而实现恒定精度具有挑战性。一种有效方案是通过优化输入分辨率(即多分辨率策略)来解决该问题。此前分辨率优化方法多基于预设分辨率的人工选择,但现有架构的运行时分辨率优化研究尚存空白。本文提出DyRA(动态分辨率调整网络),可为现检测器提供图像特异性缩放因子。该网络通过专门设计的损失函数(ParetoScaleLoss和BalanceLoss)与检测器协同训练:ParetoScaleLoss通过自适应缩放因子增强鲁棒性,BalanceLoss则根据检测器的定位性能优化全局缩放因子。损失函数设计旨在最小化不同尺寸目标缩放时因目标差异导致的精度下降。实验表明,本网络可提升RetinaNet、Faster-RCNN、FCOS、DINO及H-Deformable-DETR等多种模型的检测精度。代码开源于https://github.com/DaEunFullGrace/DyRA.git。