This paper addresses the growing interest in deploying deep learning models directly in-sensor. We present "Q-Segment", a quantized real-time segmentation algorithm, and conduct a comprehensive evaluation on a low-power edge vision platform with an in-sensors processor, the Sony IMX500. One of the main goals of the model is to achieve end-to-end image segmentation for vessel-based medical diagnosis. Deployed on the IMX500 platform, Q-Segment achieves ultra-low inference time in-sensor only 0.23 ms and power consumption of only 72mW. We compare the proposed network with state-of-the-art models, both float and quantized, demonstrating that the proposed solution outperforms existing networks on various platforms in computing efficiency, e.g., by a factor of 75x compared to ERFNet. The network employs an encoder-decoder structure with skip connections, and results in a binary accuracy of 97.25% and an Area Under the Receiver Operating Characteristic Curve (AUC) of 96.97% on the CHASE dataset. We also present a comparison of the IMX500 processing core with the Sony Spresense, a low-power multi-core ARM Cortex-M microcontroller, and a single-core ARM Cortex-M4 showing that it can achieve in-sensor processing with end-to-end low latency (17 ms) and power concumption (254mW). This research contributes valuable insights into edge-based image segmentation, laying the foundation for efficient algorithms tailored to low-power environments.
翻译:本文针对在传感器内部署深度学习模型的日益增长需求展开研究。我们提出了一种量化实时分割算法"Q-Segment",并在搭载传感器内处理器(索尼IMX500)的低功耗边缘视觉平台上进行了全面评估。该模型的核心目标之一是实现面向血管医学诊断的端到端图像分割。部署于IMX500平台的Q-Segment在传感器内推理时间仅需0.23毫秒,功耗仅为72毫瓦。我们将所提网络与当前最优的浮点及量化模型进行对比,证明该方案在多种平台上的计算效率均优于现有网络,例如相较于ERFNet提升达75倍。该网络采用含跳跃连接的编码器-解码器结构,在CHASE数据集上实现了97.25%的二值准确率与96.97%的受试者工作特征曲线下面积(AUC)。我们还将IMX500处理核心与索尼Spresense(低功耗多核ARM Cortex-M微控制器)及单核ARM Cortex-M4进行对比,表明其能够实现端到端低延迟(17毫秒)与低功耗(254毫瓦)的传感器内处理。本研究为边缘端图像分割提供了重要洞见,为面向低功耗环境的高效算法奠定了技术基础。