Currently, existing state-of-the-art 3D object detectors are in two-stage paradigm. These methods typically comprise two steps: 1) Utilize a region proposal network to propose a handful of high-quality proposals in a bottom-up fashion. 2) Resize and pool the semantic features from the proposed regions to summarize RoI-wise representations for further refinement. Note that these RoI-wise representations in step 2) are considered individually as uncorrelated entries when fed to following detection headers. Nevertheless, we observe these proposals generated by step 1) offset from ground truth somehow, emerging in local neighborhood densely with an underlying probability. Challenges arise in the case where a proposal largely forsakes its boundary information due to coordinate offset while existing networks lack corresponding information compensation mechanism. In this paper, we propose $BADet$ for 3D object detection from point clouds. Specifically, instead of refining each proposal independently as previous works do, we represent each proposal as a node for graph construction within a given cut-off threshold, associating proposals in the form of local neighborhood graph, with boundary correlations of an object being explicitly exploited. Besides, we devise a lightweight Region Feature Aggregation Module to fully exploit voxel-wise, pixel-wise, and point-wise features with expanding receptive fields for more informative RoI-wise representations. We validate BADet both on widely used KITTI Dataset and highly challenging nuScenes Dataset. As of Apr. 17th, 2021, our BADet achieves on par performance on KITTI 3D detection leaderboard and ranks $1^{st}$ on $Moderate$ difficulty of $Car$ category on KITTI BEV detection leaderboard. The source code is available at https://github.com/rui-qian/BADet.
翻译:当前,最先进的三维目标检测器普遍采用两阶段范式。这些方法通常包含两个步骤:1)利用区域建议网络以自底向上的方式生成少量高质量建议区域;2)对建议区域的语义特征进行尺寸调整与池化,以提取各感兴趣区域的表征用于进一步优化。值得注意的是,步骤2)中这些感兴趣区域表征在输入后续检测头时,通常被视为相互独立且无关联的条目。然而,我们观察到步骤1)生成的建议区域往往存在一定程度的地面真值偏移,并以潜在的概率密度聚集在局部邻域内。当建议区域因坐标偏移而严重丢失边界信息,而现有网络缺乏相应信息补偿机制时,检测便面临挑战。本文提出用于点云三维目标检测的$BADet$方法。具体而言,不同于先前工作独立优化每个建议区域,我们将每个建议区域表示为给定截断阈值内图构建的节点,以局部邻域图的形式建立建议区域间的关联,从而显式挖掘目标的边界相关性。此外,我们设计了一个轻量级区域特征聚合模块,通过扩展感受野充分融合体素级、像素级和点级特征,以获取信息更丰富的感兴趣区域表征。我们在广泛使用的KITTI数据集和极具挑战性的nuScenes数据集上验证了BADet的有效性。截至2021年4月17日,我们的BADet在KITTI三维检测排行榜上达到同等性能水平,并在KITTI鸟瞰图检测排行榜的$Car$类别$Moderate$难度级排名$1^{st}$。源代码公开于https://github.com/rui-qian/BADet。