Understanding the 3D geometry and semantics of driving scenes is critical for safe autonomous driving. Recent advances in 3D occupancy prediction have improved scene representation but often suffer from spatial inconsistencies, leading to floating artifacts and poor surface localization. Existing voxel-wise losses (e.g., cross-entropy) fail to enforce geometric coherence. In this paper, we propose GaussRender, a module that improves 3D occupancy learning by enforcing projective consistency. Our key idea is to project both predicted and ground-truth 3D occupancy into 2D camera views, where we apply supervision. Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure. To achieve this efficiently, we leverage differentiable rendering with Gaussian splatting. GaussRender seamlessly integrates with existing architectures while maintaining efficiency and requiring no inference-time modifications. Extensive evaluations on multiple benchmarks (SurroundOcc-nuScenes, Occ3D-nuScenes, SSCBench-KITTI360) demonstrate that GaussRender significantly improves geometric fidelity across various 3D occupancy models (TPVFormer, SurroundOcc, Symphonies), achieving state-of-the-art results, particularly on surface-sensitive metrics. The code is open-sourced at https://github.com/valeoai/GaussRender.
翻译:理解驾驶场景的三维几何与语义对于安全自动驾驶至关重要。三维占据预测的最新进展改善了场景表示,但常存在空间不一致性问题,导致漂浮伪影与表面定位不佳。现有体素级损失函数(如交叉熵)难以保证几何连贯性。本文提出GaussRender模块,通过强制投影一致性来改进三维占据学习。其核心思想是将预测与真值三维占据同时投影至二维相机视角,并在该视角施加监督。本方法通过惩罚产生不一致二维投影的三维配置,从而构建更连贯的三维结构。为实现高效计算,我们采用基于高斯泼溅的可微分渲染技术。GaussRender能够无缝集成至现有架构,在保持高效性的同时无需推理阶段修改。在多个基准测试(SurroundOcc-nuScenes、Occ3D-nuScenes、SSCBench-KITTI360)上的广泛评估表明,GaussRender显著提升了各类三维占据模型(TPVFormer、SurroundOcc、Symphonies)的几何保真度,尤其在表面敏感指标上取得了最先进的性能。代码已开源:https://github.com/valeoai/GaussRender。