This paper addresses fine-grained object detection in scenarios with limited computing resources, such as edge computing. In particular, we focus on a scenario where a single image contains objects of the same category but varying sizes, and we desire an algorithm that can not only recognize the physical class of objects but also detect their size. Deep learning (DL), particularly through the use of deep neural networks (DNNs), has become the primary approach to object detection. However, obtaining accurate fine-grained detection requires a large DNN model and a significant amount of annotated data, presenting a challenge to solve our problem particularly for resource-constrained scenarios. To this end, we propose an approach that utilizes commonsense knowledge to assist a coarse-grained object detector in achieving accurate size-related fine-grained detection results. Specifically, we introduce a commonsense knowledge inference module (CKIM) that processes the coarse-grained labels produced by a benchmark coarse-grained DL detector to generate size-related fine-grained labels. Our CKIM explores both crisp-rule and fuzzy-rule based inference methods, with the latter being employed to handle ambiguity in the target size-related labels. We implement our method based on two modern DL detectors, including Mobilenet-SSD, and YOLOv7-tiny. Experimental results demonstrate that our approach achieves accurate fine-grained detections with a reduced amount of annotated data, and smaller model size. Our code is available at https://github.com/ZJLAB-AMMI/CKIM.
翻译:本文针对边缘计算等计算资源受限场景下的细粒度目标检测问题。特别地,我们聚焦于单张图像包含同一类别但尺寸各异物体的场景,期望算法不仅能识别物体的物理类别,还能检测其尺寸。深度学习,尤其是深度神经网络(DNN)的应用,已成为目标检测的主要方法。然而,获得准确的细粒度检测需要大型DNN模型及大量标注数据,这对资源受限场景下的问题求解构成挑战。为此,我们提出一种利用常识知识辅助粗粒度目标检测器,以实现准确的尺寸相关细粒度检测结果的方法。具体而言,我们引入常识知识推理模块(CKIM),该模块处理基准粗粒度深度学习检测器生成的粗粒度标签,以产生尺寸相关的细粒度标签。我们的CKIM探索了基于明确规则和模糊规则的推理方法,其中后者用于处理目标尺寸相关标签中的歧义性。我们基于两种现代深度学习检测器(包括Mobilenet-SSD和YOLOv7-tiny)实现了该方法。实验结果表明,我们的方法在减少标注数据量和模型规模的同时,实现了准确的细粒度检测。代码发布于https://github.com/ZJLAB-AMMI/CKIM。