The proliferation of complex deep learning (DL) models has revolutionized various applications, including computer vision-based solutions, prompting their integration into real-time systems. However, the resource-intensive nature of these models poses challenges for deployment on low-computational power and low-memory devices, like embedded and edge devices. This work empirically investigates the optimization of such complex DL models to analyze their functionality on an embedded device, particularly on the NVIDIA Jetson Nano. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection. The experimental results reveal that, on average, optimized models exhibit a 16.11% speed improvement over their non-optimized counterparts. This not only emphasizes the critical need to consider hardware constraints and environmental sustainability in model development and deployment but also underscores the pivotal role of model optimization in enabling the widespread deployment of AI-assisted technologies on resource-constrained computational systems. It also serves as proof that prioritizing hardware-specific model optimization leads to efficient and scalable solutions that substantially decrease energy consumption and carbon footprint.
翻译:复杂深度学习模型的激增彻底改变了包括基于计算机视觉解决方案在内的多种应用,推动了其在实时系统中的集成。然而,这些模型对计算资源的高需求给部署在低算力、低内存设备(如嵌入式与边缘设备)带来了挑战。本研究通过实证方法探究了此类复杂深度学习模型的优化,以分析其在嵌入式设备(特别是NVIDIA Jetson Nano)上的运行效能。研究评估了优化模型在图像分类与视频动作检测任务中的推理速度表现。实验结果表明,优化模型相较于未优化版本平均实现了16.11%的速度提升。这不仅强调了在模型开发与部署过程中必须充分考虑硬件约束与环境可持续性,也凸显了模型优化对于在资源受限计算系统上广泛部署人工智能辅助技术的关键作用。本研究同时证明,优先开展针对特定硬件的模型优化能够产生高效且可扩展的解决方案,从而显著降低能耗与碳足迹。