In this paper, we consider fine-grained image object detection in resource-constrained cases such as edge computing. Deep learning (DL), namely learning with deep neural networks (DNNs), has become the dominating approach to object detection. To achieve accurate fine-grained detection, one needs to employ a large enough DNN model and a vast amount of data annotations, which brings a challenge for using modern DL object detectors in resource-constrained cases. To this end, we propose an approach, which leverages commonsense knowledge to assist a coarse-grained object detector to get accurate fine-grained detection results. Specifically, we introduce a commonsense knowledge inference module (CKIM) to process coarse-grained lables given by a benchmark DL detector to produce fine-grained lables. We consider both crisp-rule and fuzzy-rule based inference in our CKIM; the latter is used to handle ambiguity in the target semantic labels. We implement our method based on several modern DL detectors, namely YOLOv4, Mobilenetv3-SSD and YOLOv7-tiny. Experiment results show that our approach outperforms benchmark detectors remarkably in terms of accuracy, model size and processing latency.
翻译:本文探讨了在资源受限场景(如边缘计算)下进行细粒度图像目标检测的问题。深度学习(DL),即基于深度神经网络(DNNs)的学习,已成为目标检测的主流方法。为了实现精确的细粒度检测,需要采用足够大的DNN模型以及大量标注数据,这给在资源受限情形下使用现代DL目标检测器带来了挑战。为此,我们提出了一种方法,利用常识知识辅助粗粒度目标检测器获得准确的细粒度检测结果。具体而言,我们引入了一个常识知识推理模块(CKIM),处理基准DL检测器输出的粗粒度标签,以生成细粒度标签。我们的CKIM同时考虑了基于明确规则和模糊规则的推理,其中后者用于处理目标语义标签中的歧义。我们基于多种现代DL检测器(即YOLOv4、Mobilenetv3-SSD和YOLOv7-tiny)实现了该方法。实验结果表明,我们的方法在精度、模型大小和处理延迟方面显著优于基准检测器。