JetSeg: Efficient Real-Time Semantic Segmentation Model for Low-Power GPU-Embedded Systems

Real-time semantic segmentation is a challenging task that requires high-accuracy models with low-inference times. Implementing these models on embedded systems is limited by hardware capability and memory usage, which produces bottlenecks. We propose an efficient model for real-time semantic segmentation called JetSeg, consisting of an encoder called JetNet, and an improved RegSeg decoder. The JetNet is designed for GPU-Embedded Systems and includes two main components: a new light-weight efficient block called JetBlock, that reduces the number of parameters minimizing memory usage and inference time without sacrificing accuracy; a new strategy that involves the combination of asymmetric and non-asymmetric convolutions with depthwise-dilated convolutions called JetConv, a channel shuffle operation, light-weight activation functions, and a convenient number of group convolutions for embedded systems, and an innovative loss function named JetLoss, which integrates the Precision, Recall, and IoUB losses to improve semantic segmentation and reduce computational complexity. Experiments demonstrate that JetSeg is much faster on workstation devices and more suitable for Low-Power GPU-Embedded Systems than existing state-of-the-art models for real-time semantic segmentation. Our approach outperforms state-of-the-art real-time encoder-decoder models by reducing 46.70M parameters and 5.14% GFLOPs, which makes JetSeg up to 2x faster on the NVIDIA Titan RTX GPU and the Jetson Xavier than other models. The JetSeg code is available at https://github.com/mmontielpz/jetseg.

翻译：实时语义分割是一项极具挑战性的任务，需要在保证高精度的同时实现低推理时间。然而，在嵌入式系统上部署这些模型受限于硬件能力与内存占用，从而产生性能瓶颈。我们提出了一种名为JetSeg的高效实时语义分割模型，该模型由编码器JetNet和改进版RegSeg解码器构成。JetNet专为GPU嵌入式系统设计，包含两个核心组件：一是新型轻量高效模块JetBlock，该模块可在不牺牲精度的前提下减少参数量、最小化内存占用与推理时间；二是融合非对称与对称卷积的深度可分离膨胀卷积组合策略JetConv，配合通道混洗操作、轻量级激活函数以及适用于嵌入式系统的适量分组卷积。此外，我们创新性地提出损失函数JetLoss，整合了Precision、Recall与IoUB三种损失，从而优化语义分割效果并降低计算复杂度。实验表明，JetSeg在工作站设备上的推理速度远超现有最先进实时语义分割模型，且更适用于低功耗GPU嵌入式系统。与最先进的实时编码器-解码器模型相比，本方法减少了46.70M参数量与5.14% GFLOPs，使得JetSeg在NVIDIA Titan RTX GPU与Jetson Xavier上的推理速度最高可达其他模型的2倍。JetSeg代码已开源：https://github.com/mmontielpz/jetseg。