Amidst the rapid advancement of camera-based autonomous driving technology, effectiveness is often prioritized with limited attention to computational efficiency. To address this issue, this paper introduces LRHPerception, a real-time monocular perception package for autonomous driving that uses single-view camera video to interpret the surrounding environment. The proposed system combines the computational efficiency of end-to-end learning with the rich representational detail of local mapping methodologies. With significant improvements in object tracking and prediction, road segmentation, and depth estimation integrated into a unified framework, LRHPerception processes monocular image data into a five-channel tensor consisting of RGB, road segmentation, and pixel-level depth estimation, augmented with object detection and trajectory prediction. Experimental results demonstrate strong performance, achieving real-time processing at 29 FPS on a single GPU, representing a 555% speedup over the fastest mapping-based approach.
翻译:随着基于摄像头的自动驾驶技术快速发展,有效性往往被优先考虑,而对计算效率的关注相对有限。针对这一问题,本文提出了LRHPerception——一个用于自动驾驶的实时单目感知套件,利用单视角摄像视频解释周围环境。所提出的系统将端到端学习的计算效率与局部建图方法的丰富表征细节相结合。通过将目标跟踪与预测、道路分割及深度估计的显著改进集成于统一框架中,LRHPerception将单目图像数据处理为由RGB、道路分割和像素级深度估计组成的五通道张量,并辅以目标检测和轨迹预测。实验结果表明,该系统在单个GPU上以29 FPS实现实时处理,性能强劲,相比最快的基于建图的方法实现了555%的加速。