A popular approach for constructing bird's-eye-view (BEV) representation in 3D detection is to lift 2D image features onto the viewing frustum space based on explicitly predicted depth distribution. However, depth distribution can only characterize the 3D geometry of visible object surfaces but fails to capture their internal space and overall geometric structure, leading to sparse and unsatisfactory 3D representations. To mitigate this issue, we present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information. At the core of our method is the newly-designed instance occupancy prediction (IOP) module, which aims to infer point-level occupancy status for each instance in the frustum space. To ensure training efficiency while maintaining representational flexibility, it is trained using the combination of both explicit and implicit supervision. With the predicted occupancy, we further design a geometry-aware feature propagation mechanism (GFP), which performs self-attention based on occupancy distribution along each ray in frustum and is able to enforce instance-level feature consistency. By integrating the IOP module with GFP mechanism, our BEV-IO detector is able to render highly informative 3D scene structures with more comprehensive BEV representations. Experimental results demonstrate that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters (0.2%) and computational overhead (0.24%in GFLOPs).
翻译:一种流行的三维检测鸟瞰视图(BEV)表示构建方法,是基于显式预测的深度分布将二维图像特征提升至视锥空间。然而,深度分布仅能表征可见物体表面的三维几何信息,无法捕获其内部空间及整体几何结构,导致三维表示稀疏且不理想。为解决该问题,我们提出BEV-IO——一种利用实例占据信息增强BEV表示的新型三维检测范式。该方法的核心是新设计的实例占据预测(IOP)模块,旨在推断视锥空间中各实例的点级占据状态。为确保训练效率并保持表示灵活性,该模块采用显式监督与隐式监督相结合的方式进行训练。基于预测的占据信息,我们进一步设计了几何感知特征传播机制(GFP),该机制沿视锥中每条射线基于占据分布执行自注意力操作,能够强制实现实例级特征一致性。通过将IOP模块与GFP机制集成,我们的BEV-IO检测器能够生成信息丰富的三维场景结构,并获得更全面的BEV表示。实验结果表明,BEV-IO在仅增加微小参数(0.2%)和计算开销(0.24% GFLOPs)的情况下,可超越现有最优方法。