Efficient and high-accuracy 3D occupancy prediction is vital for the performance of autonomous driving systems. However, existing methods struggle to balance precision and efficiency: high-accuracy approaches are often hindered by heavy computational overhead, leading to slow inference speeds, while others leverage pure bird's-eye-view (BEV) representations to gain speed at the cost of losing vertical spatial cues and compromising geometric integrity. To overcome these limitations, we build on the efficient Lift-Splat-Shoot (LSS) paradigm and propose a pure 2D framework, DA-Occ, for 3D occupancy prediction that preserves fine-grained geometry. Standard LSS-based methods lift 2D features into 3D space solely based on depth scores, making it difficult to fully capture vertical structure. To improve upon this, DA-Occ augments depth-based lifting with a complementary height-score projection that explicitly encodes vertical geometric information. We further employ direction-aware convolution to extract geometric features along both vertical and horizontal orientations, effectively balancing accuracy and computational efficiency. On the Occ3D-nuScenes, the proposed method achieves an mIoU of 39.3% and an inference speed of 27.7 FPS, effectively balancing accuracy and efficiency. In simulations on edge devices, the inference speed reaches 14.8 FPS, further demonstrating the method's applicability for real-time deployment in resource-constrained environments.
翻译:高效且高精度的三维占据预测对于自动驾驶系统的性能至关重要。然而,现有方法难以在精度与效率之间取得平衡:高精度方法常受限于沉重的计算开销,导致推理速度缓慢;而其他方法则利用纯鸟瞰图表示以提升速度,但代价是丢失垂直空间线索并损害几何完整性。为克服这些局限,我们在高效的Lift-Splat-Shoot范式基础上,提出了一种纯二维框架DA-Occ,用于保持细粒度几何结构的三维占据预测。基于标准LSS的方法仅依赖深度分数将二维特征提升至三维空间,难以充分捕捉垂直结构。为此,DA-Occ通过引入互补的高度分数投影来增强基于深度的提升过程,该投影显式编码垂直几何信息。我们进一步采用方向感知卷积,沿垂直和水平方向提取几何特征,有效平衡了精度与计算效率。在Occ3D-nuScenes数据集上,所提方法取得了39.3%的mIoU和27.7 FPS的推理速度,实现了精度与效率的有效平衡。在边缘设备上的仿真测试中,推理速度达到14.8 FPS,进一步证明了该方法在资源受限环境中实时部署的适用性。