Concurrent processing of multiple autonomous driving 3D perception tasks within the same spatiotemporal scene poses a significant challenge, in particular due to the computational inefficiencies and feature competition between tasks when using traditional multi-task learning approaches. This paper addresses these issues by proposing a novel unified representation, RepVF, which harmonizes the representation of various perception tasks such as 3D object detection and 3D lane detection within a single framework. RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model that significantly reduces computational redundancy and feature competition. Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks by utilizing a hierarchical structure of queries that implicitly model the relationships both between and within tasks. This approach eliminates the need for task-specific heads and parameters, fundamentally reducing the conflicts inherent in traditional multi-task learning paradigms. We validate our approach by combining labels from the OpenLane dataset with the Waymo Open dataset. Our work presents a significant advancement in the efficiency and effectiveness of multi-task perception in autonomous driving, offering a new perspective on handling multiple 3D perception tasks synchronously and in parallel. The code will be available at: https://github.com/jbji/RepVF
翻译:在同一时空场景中并发处理多个自动驾驶三维感知任务是一项重大挑战,这尤其源于传统多任务学习方法存在的计算效率低下与任务间特征竞争问题。本文通过提出一种新颖的统一表示方法RepVF来解决这些问题,该方法将三维目标检测和三维车道线检测等多种感知任务的表示统一在单一框架内。RepVF通过矢量场刻画场景中不同目标的结构,实现了单头多任务学习模型,显著减少了计算冗余与特征竞争。基于RepVF,我们进一步提出RFTR网络,该网络利用层次化查询结构来挖掘不同任务间的内在关联,隐式建模任务间及任务内部的关系。该方法无需任务特定的头部模块和参数,从根本上减少了传统多任务学习范式固有的冲突。我们通过融合OpenLane数据集与Waymo Open数据集的标注信息验证了所提方法的有效性。本研究在自动驾驶多任务感知的效率和性能方面取得了显著进展,为同步并行处理多个三维感知任务提供了新视角。代码发布于:https://github.com/jbji/RepVF