This paper introduces a highly flexible, quantized, memory-efficient, and ultra-lightweight object detection network, called TinyissimoYOLO. It aims to enable object detection on microcontrollers in the power domain of milliwatts, with less than 0.5MB memory available for storing convolutional neural network (CNN) weights. The proposed quantized network architecture with 422k parameters, enables real-time object detection on embedded microcontrollers, and it has been evaluated to exploit CNN accelerators. In particular, the proposed network has been deployed on the MAX78000 microcontroller achieving high frame-rate of up to 180fps and an ultra-low energy consumption of only 196{\mu}J per inference with an inference efficiency of more than 106 MAC/Cycle. TinyissimoYOLO can be trained for any multi-object detection. However, considering the small network size, adding object detection classes will increase the size and memory consumption of the network, thus object detection with up to 3 classes is demonstrated. Furthermore, the network is trained using quantization-aware training and deployed with 8-bit quantization on different microcontrollers, such as STM32H7A3, STM32L4R9, Apollo4b and on the MAX78000's CNN accelerator. Performance evaluations are presented in this paper.
翻译:本文提出了一种高度灵活、量化、内存高效且超轻量级的目标检测网络,称为TinyissimoYOLO。该网络旨在实现在功耗为毫瓦级、存储卷积神经网络(CNN)权重内存小于0.5MB的微控制器上进行目标检测。所提出的量化网络架构拥有422k个参数,能够在嵌入式微控制器上实现实时目标检测,并已评估其对CNN加速器的利用效果。特别地,该网络已部署于MAX78000微控制器,实现了高达180fps的高帧率,每次推理仅消耗196μJ的超低能耗,推理效率超过106 MAC/周期。TinyissimoYOLO可针对任意多目标检测任务进行训练。然而,考虑到网络规模较小,增加目标检测类别会增加网络规模与内存消耗,因此本文展示了最多3个类别的目标检测。此外,该网络采用量化感知训练进行训练,并以8位量化方式部署于不同微控制器(如STM32H7A3、STM32L4R9、Apollo4b)以及MAX78000的CNN加速器上。本文给出了相应的性能评估结果。