A popular approach for constructing bird's-eye-view (BEV) representation in 3D detection is to lift 2D image features onto the viewing frustum space based on explicitly predicted depth distribution. However, depth distribution can only characterize the 3D geometry of visible object surfaces but fails to capture their internal space and overall geometric structure, leading to sparse and unsatisfactory 3D representations. To mitigate this issue, we present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information. At the core of our method is the newly-designed instance occupancy prediction (IOP) module, which aims to infer point-level occupancy status for each instance in the frustum space. To ensure training efficiency while maintaining representational flexibility, it is trained using the combination of both explicit and implicit supervision. With the predicted occupancy, we further design a geometry-aware feature propagation mechanism (GFP), which performs self-attention based on occupancy distribution along each ray in frustum and is able to enforce instance-level feature consistency. By integrating the IOP module with GFP mechanism, our BEV-IO detector is able to render highly informative 3D scene structures with more comprehensive BEV representations. Experimental results demonstrate that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters (0.2%) and computational overhead (0.24%in GFLOPs).
翻译:构建鸟瞰视角(BEV)表示的一种流行方法是基于显式预测的深度分布,将2D图像特征提升至视锥空间。然而,深度分布仅能刻画可见物体表面的三维几何信息,无法捕捉其内部空间与整体几何结构,导致BEV表示稀疏且效果不佳。为解决这一问题,我们提出BEV-IO——一种利用实例占据信息增强BEV表示的新型3D检测范式。该方法的核心是新设计的实例占据预测(IOP)模块,旨在推理视锥空间中每个实例的点级占据状态。为确保训练效率并保持表示灵活性,该模块采用显式与隐式监督相结合的方式进行训练。基于预测的占据信息,我们进一步设计了几何感知特征传播机制(GFP),该机制根据视锥中每条射线上的占据分布执行自注意力操作,能够实现实例级别的特征一致性。通过将IOP模块与GFP机制集成,我们的BEV-IO检测器能够渲染具有更丰富信息的3D场景结构,并生成更全面的BEV表示。实验结果表明,BEV-IO在仅增加可忽略的参数(0.2%)和计算开销(GFLOPs增加0.24%)的情况下,性能超越当前最先进方法。