Recent LSS-based multi-view 3D object detection has made tremendous progress, by processing the features in Brid-Eye-View (BEV) via the convolutional detector. However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization. To preserve the inherent property of the BEV features and ease the optimization, we propose an azimuth-equivariant convolution (AeConv) and an azimuth-equivariant anchor. The sampling grid of AeConv is always in the radial direction, thus it can learn azimuth-invariant BEV features. The proposed anchor enables the detection head to learn predicting azimuth-irrelevant targets. In addition, we introduce a camera-decoupled virtual depth to unify the depth prediction for the images with different camera intrinsic parameters. The resultant detector is dubbed Azimuth-equivariant Detector (AeDet). Extensive experiments are conducted on nuScenes, and AeDet achieves a 62.0% NDS, surpassing the recent multi-view 3D object detectors such as PETRv2 and BEVDepth by a large margin. Project page: https://fcjian.github.io/aedet.
翻译:近年来,基于LSS的多视角3D目标检测通过卷积检测器处理鸟瞰图(BEV)特征取得了巨大进展。然而,标准卷积忽略了BEV特征的径向对称性,增加了检测器优化的难度。为保留BEV特征的固有关联性并简化优化过程,我们提出了一种方位角等变卷积(AeConv)和方位角等变锚框。AeConv的采样网格始终沿径向方向,从而能学习方位角不变的BEV特征。所提出的锚框使检测头能够学习预测与方位角无关的目标。此外,我们引入了一种相机解耦的虚拟深度,以统一不同相机内参图像中的深度预测。由此产生的检测器被称为方位角等变检测器(AeDet)。在nuScenes数据集上开展了大量实验,AeDet取得了62.0%的NDS,大幅超越了近期的多视角3D目标检测器(如PETRv2和BEVDepth)。项目页面:https://fcjian.github.io/aedet。