Existing vision-based 3D occupancy prediction methods are inherently limited in accuracy due to their exclusive reliance on street-view imagery, neglecting the potential benefits of incorporating satellite views. We propose SA-Occ, the first Satellite-Assisted 3D occupancy prediction model, which leverages GPS & IMU to integrate historical yet readily available satellite imagery into real-time applications, effectively mitigating limitations of ego-vehicle perceptions, involving occlusions and degraded performance in distant regions. To address the core challenges of cross-view perception, we propose: 1) Dynamic-Decoupling Fusion, which resolves inconsistencies in dynamic regions caused by the temporal asynchrony between satellite and street views; 2) 3D-Proj Guidance, a module that enhances 3D feature extraction from inherently 2D satellite imagery; and 3) Uniform Sampling Alignment, which aligns the sampling density between street and satellite views. Evaluated on Occ3D-nuScenes, SA-Occ achieves state-of-the-art performance, especially among single-frame methods, with a 39.05% mIoU (a 6.97% improvement), while incurring only 6.93 ms of additional latency per frame. Our code and newly curated dataset are available at https://github.com/chenchen235/SA-Occ.
翻译:现有的基于视觉的三维占据栅格预测方法,由于其完全依赖街景图像而忽略了融入卫星视图的潜在优势,其准确性存在固有局限。我们提出了SA-Occ,首个卫星辅助的三维占据栅格预测模型。该模型利用GPS和IMU,将历史且易于获取的卫星图像整合到实时应用中,有效缓解了自车感知的局限性,包括遮挡和远距离区域性能下降等问题。为解决跨视图感知的核心挑战,我们提出了:1) 动态解耦融合,用于解决由卫星视图与街景视图之间的时间异步性所导致的动态区域不一致问题;2) 3D投影引导模块,用于增强从本质上是二维的卫星图像中提取三维特征的能力;3) 均匀采样对齐,用于对齐街景与卫星视图之间的采样密度。在Occ3D-nuScenes数据集上的评估表明,SA-Occ实现了最先进的性能,尤其是在单帧方法中,达到了39.05%的mIoU(提升了6.97%),同时每帧仅带来6.93毫秒的额外延迟。我们的代码和新整理的数据集可在 https://github.com/chenchen235/SA-Occ 获取。