An Efficient Transformer for Simultaneous Learning of BEV and Lane Representations in 3D Lane Detection

Accurately detecting lane lines in 3D space is crucial for autonomous driving. Existing methods usually first transform image-view features into bird-eye-view (BEV) by aid of inverse perspective mapping (IPM), and then detect lane lines based on the BEV features. However, IPM ignores the changes in road height, leading to inaccurate view transformations. Additionally, the two separate stages of the process can cause cumulative errors and increased complexity. To address these limitations, we propose an efficient transformer for 3D lane detection. Different from the vanilla transformer, our model contains a decomposed cross-attention mechanism to simultaneously learn lane and BEV representations. The mechanism decomposes the cross-attention between image-view and BEV features into the one between image-view and lane features, and the one between lane and BEV features, both of which are supervised with ground-truth lane lines. Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively. This allows for a more accurate view transformation than IPM-based methods, as the view transformation is learned from data with a supervised cross-attention. Additionally, the cross-attention between lane and BEV features enables them to adjust to each other, resulting in more accurate lane detection than the two separate stages. Finally, the decomposed cross-attention is more efficient than the original one. Experimental results on OpenLane and ONCE-3DLanes demonstrate the state-of-the-art performance of our method.

翻译：准确检测3D空间中的车道线对自动驾驶至关重要。现有方法通常先利用逆透视映射（IPM）将图像视角特征变换为鸟瞰视角（BEV），再基于BEV特征检测车道线。然而，IPM忽略了道路高度的变化，导致视角变换不准确。此外，该流程的两个独立阶段可能引发累积误差并增加复杂度。为解决这些局限，我们提出了一种用于3D车道检测的高效Transformer。与标准Transformer不同，我们的模型包含分解式交叉注意力机制，可同时学习车道和BEV表示。该机制将图像视角与BEV特征之间的交叉注意力分解为图像视角与车道特征、以及车道与BEV特征之间的交叉注意力，两者均使用真实车道线进行监督。通过将车道特征分别应用于图像视角和BEV特征，我们的方法可同时获得2D和3D车道预测。由于视角变换通过监督式交叉注意力从数据中学习，相比基于IPM的方法，本方法能实现更准确的视角变换。同时，车道与BEV特征间的交叉注意力促使二者相互适配，相较于两阶段独立处理，可显著提升车道检测精度。最后，分解式交叉注意力相比原始交叉注意力具有更高效率。在OpenLane和ONCE-3DLanes上的实验结果表明，本方法达到了最先进的性能水平。