Embodied artificial intelligence (AI) requires pushing complex multi-modal models to the extreme edge for time-constrained tasks such as autonomous navigation of robots and vehicles. On small form-factor devices, e.g., nano-sized unmanned aerial vehicles (UAVs), such challenges are exacerbated by stringent constraints on energy efficiency and weight. In this paper, we explore embodied multi-modal AI-based perception for Nano-UAVs with the Kraken shield, a 7g multi-sensor (frame-based and event-based imagers) board based on Kraken, a 22 nm SoC featuring multiple acceleration engines for multi-modal event and frame-based inference based on spiking (SNN) and ternary (TNN) neural networks, respectively. Kraken can execute SNN real-time inference for depth estimation at 1.02k inf/s, 18 {\mu}J/inf, TNN real-time inference for object classification at 10k inf/s, 6 {\mu}J/inf, and real-time inference for obstacle avoidance at 221 frame/s, 750 {\mu}J/inf.
翻译:具身人工智能(AI)需要将复杂的多模态模型推向极端边缘,以应对机器人及车辆自主导航等时间约束任务。在小型化设备(例如纳米级无人机)上,严格的能效与重量限制使此类挑战更为严峻。本文探索了基于Kraken Shield的纳米无人机具身多模态AI感知方案。Kraken Shield是一款7克重的多传感器(基于帧与基于事件的成像器)板卡,其核心为Kraken——一款22纳米系统级芯片,配备多个加速引擎,可分别支持基于脉冲神经网络(SNN)和三元神经网络(TNN)的多模态事件与帧数据推理。Kraken能够以1.02千次推理/秒、18微焦耳/推理的能效执行深度估计的SNN实时推理;以10千次推理/秒、6微焦耳/推理的能效执行目标分类的TNN实时推理;并以221帧/秒、750微焦耳/推理的能效执行避障任务的实时推理。