Object detection models, a prominent class of machine learning algorithms, aim to identify and precisely locate objects in images or videos. However, this task might yield uneven performances sometimes caused by the objects sizes and the quality of the images and labels used for training. In this paper, we highlight the importance of large objects in learning features that are critical for all sizes. Given these findings, we propose to introduce a weighting term into the training loss. This term is a function of the object area size. We show that giving more weight to large objects leads to improved detection scores across all object sizes and so an overall improvement in Object Detectors performances (+2 p.p. of mAP on small objects, +2 p.p. on medium and +4 p.p. on large on COCO val 2017 with InternImage-T). Additional experiments and ablation studies with different models and on a different dataset further confirm the robustness of our findings.
翻译:目标检测模型作为机器学习算法中的一类重要方法,旨在识别并精确定位图像或视频中的物体。然而,由于目标尺寸、训练所用图像及标签质量等因素的影响,该任务有时会出现性能不均衡的问题。本文中,我们强调了大目标在学习对所有尺寸目标都至关重要的特征方面所起的作用。基于这一发现,我们提出在训练损失中引入一个权重项,该权重项是目标面积的函数。研究表明,对大目标赋予更高权重可显著提升所有尺寸目标的检测得分,从而全面改善目标检测器的性能(在COCO val 2017数据集上,采用InternImage-T模型时,小目标mAP提升2个百分点,中等目标提升2个百分点,大目标提升4个百分点)。在不同模型及不同数据集上进行的额外实验与消融研究进一步验证了我们研究结果的稳健性。