As object detection techniques continue to evolve, understanding their relationships with complementary visual tasks becomes crucial for optimising model architectures and computational resources. This paper investigates the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction. Through comprehensive experiments using state-of-the-art models (DeepGaze IIE, Depth Anything, DPT-Large, and Itti's model) on COCO and Pascal VOC datasets, we find that visual saliency shows consistently stronger correlations with object detection accuracy (mA$\rho$ up to 0.459 on Pascal VOC) compared to depth prediction (mA$\rho$ up to 0.283). Our analysis reveals significant variations in these correlations across object categories, with larger objects showing correlation values up to three times higher than smaller objects. These findings suggest incorporating visual saliency features into object detection architectures could be more beneficial than depth information, particularly for specific object categories. The observed category-specific variations also provide insights for targeted feature engineering and dataset design improvements, potentially leading to more efficient and accurate object detection systems.
翻译:随着目标检测技术的不断发展,理解其与互补视觉任务之间的关系对于优化模型架构和计算资源至关重要。本文研究了目标检测精度与两个基础视觉任务——深度预测和视觉显著性预测——之间的相关性。通过在COCO和Pascal VOC数据集上使用前沿模型(DeepGaze IIE、Depth Anything、DPT-Large和Itti模型)进行综合实验,我们发现与深度预测(在Pascal VOC上mA$\rho$最高为0.283)相比,视觉显著性始终表现出与目标检测精度更强的相关性(在Pascal VOC上mA$\rho$最高达0.459)。我们的分析揭示了这些相关性在不同物体类别间存在显著差异,其中较大物体的相关性值最高可达较小物体的三倍。这些发现表明,将视觉显著性特征融入目标检测架构可能比深度信息更具优势,尤其对于特定物体类别。观察到的类别特异性差异也为定向特征工程和数据集设计改进提供了参考,有望推动构建更高效、更准确的目标检测系统。