In this paper, we focus on a scenario where a single image contains objects of the same category but varying sizes, and we propose a lightweight approach that can not only recognize their category labels but also their real sizes. Our approach utilizes commonsense knowledge to assist a deep neural network (DNN) based coarse-grained object detector to achieve accurate size-related fine-grained detection. Specifically, we introduce a commonsense knowledge inference module (CKIM) that maps the coarse-grained labels produced by the DL detector to size-related fine-grained labels. Experimental results demonstrate that our approach achieves accurate fine-grained detections with a reduced amount of annotated data, and smaller model size, compared with baseline methods. Our code is available at: https://github.com/ZJLAB-AMMI/CKIM.
翻译:本文聚焦于同一图像中包含相同类别但尺寸不同的对象的场景,并提出一种轻量级方法,该方法不仅能识别这些对象的类别标签,还能识别其真实尺寸。我们的方法利用常识知识辅助基于深度神经网络的粗粒度目标检测器,以实现准确的尺寸相关细粒度检测。具体而言,我们引入了一个常识知识推理模块(CKIM),该模块将深度学习检测器生成的粗粒度标签映射为尺寸相关的细粒度标签。实验结果表明,与基线方法相比,我们的方法在减少标注数据量且模型规模更小的情况下,实现了准确的细粒度检测。我们的代码可在以下地址获取:https://github.com/ZJLAB-AMMI/CKIM。