Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse. In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration. For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenecks. Through simulations, we evaluate a CPU and two systolic inference accelerator implementations. Next, we compare these hardware solutions with advanced technology nodes. The impact of integrating state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM) into the XR-AI inference pipeline is evaluated. We found that significant energy benefits (>=24%) can be achieved for hand detection (IPS=10) and eye segmentation (IPS=0.1) by introducing non-volatile memory in the memory hierarchy for designs at 7nm node while meeting minimum IPS (inference per second). Moreover, we can realize substantial reduction in area (>=30%) owing to the small form factor of MRAM compared to traditional SRAM.
翻译:低功耗边缘AI能力对于支持元宇宙愿景的终端扩展现实(XR)应用至关重要。本研究针对两种代表性XR工作负载:(i)手势检测和(ii)眼睛分割,开展硬件设计空间探索。针对这两类应用,我们训练深度神经网络,分析量化及硬件特定瓶颈的影响。通过仿真,我们评估了CPU和两种脉动推理加速器实现方案,进而将这些硬件方案与先进工艺节点进行比较。进一步评估了将最先进的新型非易失性存储技术(STT/SOT/VGSOT MRAM)集成到XR-AI推理流水线中的影响。研究发现,在7nm节点设计中,通过在存储层次中引入非易失性存储器,可在满足最低推理吞吐量(IPS)要求的前提下,使手势检测(IPS=10)和眼睛分割(IPS=0.1)分别实现≥24%的显著能效提升。此外,由于MRAM相比传统SRAM具有更小的物理尺寸,可带来≥30%的面积缩减。