Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO

Smart glasses are rapidly gaining advanced functionality thanks to cutting-edge computing technologies, accelerated hardware architectures, and tiny AI algorithms. Integrating AI into smart glasses featuring a small form factor and limited battery capacity is still challenging when targeting full-day usage for a satisfactory user experience. This paper illustrates the design and implementation of tiny machine-learning algorithms exploiting novel low-power processors to enable prolonged continuous operation in smart glasses. We explore the energy- and latency-efficient of smart glasses in the case of real-time object detection. To this goal, we designed a smart glasses prototype as a research platform featuring two microcontrollers, including a novel milliwatt-power RISC-V parallel processor with a hardware accelerator for visual AI, and a Bluetooth low-power module for communication. The smart glasses integrate power cycling mechanisms, including image and audio sensing interfaces. Furthermore, we developed a family of novel tiny deep-learning models based on YOLO with sub-million parameters customized for microcontroller-based inference dubbed TinyissimoYOLO v1.3, v5, and v8, aiming at benchmarking object detection with smart glasses for energy and latency. Evaluations on the prototype of the smart glasses demonstrate TinyissimoYOLO's 17ms inference latency and 1.59mJ energy consumption per inference while ensuring acceptable detection accuracy. Further evaluation reveals an end-to-end latency from image capturing to the algorithm's prediction of 56ms or equivalently 18 fps, with a total power consumption of 62.9mW, equivalent to a 9.3 hours of continuous run time on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 fps per second.

翻译：智能眼镜正凭借前沿计算技术、加速硬件架构与微型AI算法迅速获得先进功能。在追求全天候使用以带来满意用户体验的目标下，将AI集成到具有小尺寸和有限电池容量的智能眼镜中仍具挑战。本文阐述了利用新型低功耗处理器设计并实现微型机器学习算法，从而在智能眼镜中实现长时间连续运行。我们以实时目标检测为场景，探索了智能眼镜在能效与时延方面的表现。为此，设计了一款作为研究平台的智能眼镜原型，该原型包含两个微控制器：一个集成视觉AI硬件加速器的新型毫瓦级RISC-V并行处理器，以及一个用于通信的低功耗蓝牙模块。智能眼镜集成了电源循环机制，包括图像与音频传感接口。此外，基于YOLO（参数规模低于百万）开发了一系列专为微控制器推理定制的微型深度学习模型——TinyissimoYOLO v1.3、v5和v8，旨在以智能眼镜为平台对目标检测的能耗与延迟进行基准测试。在智能眼镜原型上的评估显示，TinyissimoYOLO在保持可接受检测精度的同时，推理延迟为17ms，单次推理能耗为1.59mJ。进一步评估表明，从图像采集到算法预测的端到端延迟为56ms（等效18帧/秒），总功耗为62.9mW，相当于154mAh电池可连续运行9.3小时。这些性能优于执行更简单任务（图像分类）且仅达到7.3帧/秒的MCUNet（TinyNAS+TinyEngine）。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日