Advances in lightweight neural networks have revolutionized computer vision in a broad range of IoT applications, encompassing remote monitoring and process automation. However, the detection of small objects, which is crucial for many of these applications, remains an underexplored area in current computer vision research, particularly for low-power embedded devices that host resource-constrained processors. To address said gap, this paper proposes an adaptive tiling method for lightweight and energy-efficient object detection networks, including YOLO-based models and the popular FOMO network. The proposed tiling enables object detection on low-power MCUs with no compromise on accuracy compared to large-scale detection models. The benefit of the proposed method is demonstrated by applying it to FOMO and TinyissimoYOLO networks on a novel RISC-V-based MCU with built-in ML accelerators. Extensive experimental results show that the proposed tiling method boosts the F1-score by up to 225% for both FOMO and TinyissimoYOLO networks while reducing the average object count error by up to 76% with FOMO and up to 89% for TinyissimoYOLO. Furthermore, the findings of this work indicate that using a soft F1 loss over the popular binary cross-entropy loss can serve as an implicit non-maximum suppression for the FOMO network. To evaluate the real-world performance, the networks are deployed on the RISC-V based GAP9 microcontroller from GreenWaves Technologies, showcasing the proposed method's ability to strike a balance between detection performance ($58% - 95%$ F1 score), low latency (0.6 ms/Inference - 16.2 ms/Inference}), and energy efficiency (31 uJ/Inference} - 1.27 mJ/Inference) while performing multiple predictions using high-resolution images on a MCU.
翻译:轻量级神经网络的进步已彻底改变了广泛物联网应用中的计算机视觉,涵盖远程监控与过程自动化。然而,小目标检测作为许多此类应用的关键环节,在当前计算机视觉研究中仍是一个探索不足的领域,尤其对于搭载资源受限处理器的低功耗嵌入式设备而言。为填补这一空白,本文提出一种适用于轻量级高能效目标检测网络的自适应分块方法,包括基于YOLO的模型及流行的FOMO网络。所提出的分块方法使得在低功耗MCU上实现目标检测成为可能,且与大规模检测模型相比精度无损。通过将所提方法应用于基于新型内置机器学习加速器的RISC-V架构MCU上的FOMO与TinyissimoYOLO网络,验证了该方法的优势。大量实验结果表明:所提出的分块方法将FOMO与TinyissimoYOLO网络的F1分数最高提升225%,同时将平均目标计数误差在FOMO网络中降低最高76%,在TinyissimoYOLO网络中降低最高89%。此外,本研究发现:相较于广泛使用的二元交叉熵损失,采用软F1损失可为FOMO网络提供隐式的非极大值抑制功能。为评估实际性能,将网络部署于GreenWaves Technologies公司基于RISC-V架构的GAP9微控制器上,结果表明所提方法能在MCU上使用高分辨率图像进行多重预测时,在检测性能(F1分数58%–95%)、低延迟(0.6毫秒/推理–16.2毫秒/推理)与能效(31微焦/推理–1.27毫焦/推理)之间取得良好平衡。