View Transformation Module (VTM), where transformations happen between multi-view image features and Bird-Eye-View (BEV) representation, is a crucial step in camera-based BEV perception systems. Currently, the two most prominent VTM paradigms are forward projection and backward projection. Forward projection, represented by Lift-Splat-Shoot, leads to sparsely projected BEV features without post-processing. Backward projection, with BEVFormer being an example, tends to generate false-positive BEV features from incorrect projections due to the lack of utilization on depth. To address the above limitations, we propose a novel forward-backward view transformation module. Our approach compensates for the deficiencies in both existing methods, allowing them to enhance each other to obtain higher quality BEV representations mutually. We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4\% NDS on the nuScenes test set. The code will be released at \url{https://github.com/NVlabs/FB-BEV}.
翻译:视角变换模块(VTM)是多视图图像特征与鸟瞰图(BEV)表征之间进行变换的关键环节,在基于摄像头的BEV感知系统中具有重要作用。目前,两种最主要的VTM范式是前向投影和后向投影。以Lift-Splat-Shoot为代表的前向投影方法,在不进行后处理的情况下会产生稀疏投影的BEV特征。而以BEVFormer为例的后向投影方法,由于缺乏深度信息的利用,容易因错误投影而产生虚假的BEV特征。为克服上述局限性,我们提出了一种新颖的前向-后向视角变换模块。该方法弥补了现有两种方法的缺陷,使其能够相互增强,从而共同获得更高质量的BEV表征。我们将所提模块实例化为FB-BEV,在nuScenes测试集上取得了62.4% NDS的最新最优结果。代码将发布于 \url{https://github.com/NVlabs/FB-BEV}。