The rise of power-efficient embedded computers based on highly-parallel accelerators opens a number of opportunities and challenges for researchers and engineers, and paved the way to the era of edge computing. At the same time, advances in embedded AI for object detection and categorization such as YOLO, GoogleNet and AlexNet reached an unprecedented level of accuracy (mean-Average Precision - mAP) and performance (Frames-Per-Second - FPS). Today, edge computers based on heterogeneous many-core systems are a predominant choice to deploy such systems in industry 4.0, wearable devices, and - our focus - autonomous driving systems. In these latter systems, engineers struggle to make reduced automotive power and size budgets co-exist with the accuracy and performance targets requested by autonomous driving. We aim at validating the effectiveness and efficiency of most recent networks on state-of-the-art platforms with embedded commercial-off-the-shelf System-on-Chips, such as Xavier AGX, Tegra X2 and Nano for NVIDIA and XCZU9EG and XCZU3EG of the Zynq UltraScale+ family, for the Xilinx counterpart. Our work aims at supporting engineers in choosing the most appropriate CNN package and computing system for their designs, and deriving guidelines for adequately sizing their systems.
翻译:基于高并行加速器的低功耗嵌入式计算机的兴起,为研究人员和工程师带来了众多机遇与挑战,并推动了边缘计算时代的到来。同时,基于YOLO、GoogleNet和AlexNet等算法的嵌入式AI在目标检测与分类领域达到了前所未有的精度(平均精度均值,mAP)和性能(每秒帧数,FPS)。目前,基于异构众核系统的边缘计算机已成为在工业4.0、可穿戴设备及我们关注的自动驾驶系统中部署此类系统的主流选择。在自动驾驶系统中,工程师们面临如何在满足精度与性能目标的同时,实现汽车级低功耗与小尺寸约束的难题。本文旨在验证最新网络在嵌入式商用现货(COTS)系统级芯片(SoC)平台上的效能与效率,包括NVIDIA公司的Xavier AGX、Tegra X2和Nano,以及Xilinx公司Zynq UltraScale+系列的XCZU9EG和XCZU3EG。我们的工作旨在帮助工程师为其设计选择最合适的CNN封装与计算系统,并推导出系统规模设计的指导准则。