Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System

Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.

翻译：嵌入式系统平台的创新性改进，特别是硬件加速技术，深刻影响着深度学习在现实场景中的应用。这些创新将人工劳动转化为自动化智能系统，广泛应用于自动驾驶、机器人技术、物联网及其他重要领域。NVIDIA的Jetson平台作为先驱之一，在深度学习算法执行中提供了能效与吞吐量的最优性能。以往的大多数基准测试分析基于2D图像，且每次比较结果仅使用单一深度学习模型。本文实现了一个端到端的视频犯罪现场异常检测系统，该系统输入监控视频，并完全部署在多个Jetson边缘设备（Nano、AGX Xavier、Orin Nano）上运行。比较分析包含了使用NVIDIA的软件开发工具包Torch-TensorRT进行模型性能优化。系统基于Facebook的开源项目PySlowfast作为代码模板构建。端到端系统流程包括摄像机视频采集、数据预处理管道、特征提取器及异常检测。我们提供了基于Docker技术在各种Jetson边缘设备上部署AI系统的实践经验。在异常检测器方面，系统采用了名为鲁棒时序特征幅度学习的弱监督视频深度学习模型。该方案在Jetson边缘设备上实现了47.56帧/秒的推理速度，总内存占用仅为3.11 GB。我们还发现，该AI系统在性能较上一代Jetson设备提升15%的同时，功耗降低了50%。