This study introduces a lightweight U-Net model optimized for real-time semantic segmentation of aerial images, targeting the efficient utilization of Commercial Off-The-Shelf (COTS) embedded computing platforms. We maintain the accuracy of the U-Net on a real-world dataset while significantly reducing the model's parameters and Multiply-Accumulate (MAC) operations by a factor of 16. Our comprehensive analysis covers three hardware platforms (CPU, GPU, and FPGA) and five different toolchains (TVM, FINN, Vitis AI, TensorFlow GPU, and cuDNN), assessing each on metrics such as latency, power consumption, memory footprint, energy efficiency, and FPGA resource usage. The results highlight the trade-offs between these platforms and toolchains, with a particular focus on the practical deployment challenges in real-world applications. Our findings demonstrate that while the FPGA with Vitis AI emerges as the superior choice due to its performance, energy efficiency, and maturity, it requires specialized hardware knowledge, emphasizing the need for a balanced approach in selecting embedded computing solutions for semantic segmentation tasks
翻译:本研究提出一种轻量化的U-Net模型,旨在实现对航拍图像的实时语义分割,并针对商用现成(COTS)嵌入式计算平台进行优化。在保持模型在真实数据集上精度的同时,我们将模型参数量与乘累加(MAC)运算量显著降低了16倍。我们全面评估了三种硬件平台(CPU、GPU和FPGA)及五种工具链(TVM、FINN、Vitis AI、TensorFlow GPU和cuDNN),从延迟、功耗、内存占用、能效及FPGA资源使用率等多个维度进行对比分析。研究结果揭示了不同平台与工具链之间的权衡关系,并特别关注实际应用中的部署挑战。实验表明,虽然搭载Vitis AI的FPGA凭借其性能表现、能效优势及工具成熟度成为最优选择,但其部署需要专业的硬件知识,这凸显了在语义分割任务中选择嵌入式计算方案时需采取平衡策略的重要性。