Object detection and segmentation are two core modules of an autonomous vehicle perception system. They should have high efficiency and low latency while reducing computational complexity. Currently, the most commonly used algorithms are based on deep neural networks, which guarantee high efficiency but require high-performance computing platforms. In the case of autonomous vehicles, i.e. cars, but also drones, it is necessary to use embedded platforms with limited computing power, which makes it difficult to meet the requirements described above. A reduction in the complexity of the network can be achieved by using an appropriate: architecture, representation (reduced numerical precision, quantisation, pruning), and computing platform. In this paper, we focus on the first factor - the use of so-called detection-segmentation networks as a component of a perception system. We considered the task of segmenting the drivable area and road markings in combination with the detection of selected objects (pedestrians, traffic lights, and obstacles). We compared the performance of three different architectures described in the literature: MultiTask V3, HybridNets, and YOLOP. We conducted the experiments on a custom dataset consisting of approximately 500 images of the drivable area and lane markings, and 250 images of detected objects. Of the three methods analysed, MultiTask V3 proved to be the best, achieving 99% mAP_50 for detection, 97% MIoU for drivable area segmentation, and 91% MIoU for lane segmentation, as well as 124 fps on the RTX 3060 graphics card. This architecture is a good solution for embedded perception systems for autonomous vehicles. The code is available at: https://github.com/vision-agh/MMAR_2023.
翻译:目标检测与分割是自主车辆感知系统的两大核心模块,需在降低计算复杂度的同时兼具高效率和低延迟。当前最常用的算法基于深度神经网络,虽能保证高效性,但需要高性能计算平台支持。在自主车辆(如汽车及无人机)场景中,必须采用计算能力有限的嵌入式平台,这使得上述需求难以满足。网络复杂度的降低可通过以下途径实现:采用合适的架构、表征方式(降低数值精度、量化、剪枝)及计算平台。本文聚焦于首个因素——采用所谓的检测-分割网络作为感知系统的组成部分。我们研究了可行驶区域与道路标线分割任务,并结合特定目标(行人、交通灯、障碍物)的检测任务。通过对比文献中描述的三种不同架构(MultiTask V3、HybridNets和YOLOP)的性能表现,基于包含约500张可行驶区域与车道标线图像及250张检测目标图像的定制数据集进行实验。在分析的三种方法中,MultiTask V3表现最优,在RTX 3060显卡上实现检测任务mAP_50达到99%、可行驶区域分割MIoU达97%、车道分割MIoU达91%,同时处理帧率达124fps。该架构是面向自主车辆嵌入式感知系统的优秀解决方案。代码开源地址:https://github.com/vision-agh/MMAR_2023。