Environment prediction frameworks are critical for the safe navigation of autonomous vehicles (AVs) in dynamic settings. LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation, enabling self-supervised joint scene predictions while exhibiting resilience to partial observability and perception detection failures. Prior approaches have focused on deterministic L-OGM prediction architectures within the grid cell space. While these methods have seen some success, they frequently produce unrealistic predictions and fail to capture the stochastic nature of the environment. Additionally, they do not effectively integrate additional sensor modalities present in AVs. Our proposed framework, Latent Occupancy Prediction (LOPR), performs stochastic L-OGM prediction in the latent space of a generative architecture and allows for conditioning on RGB cameras, maps, and planned trajectories. We decode predictions using either a single-step decoder, which provides high-quality predictions in real-time, or a diffusion-based batch decoder, which can further refine the decoded frames to address temporal consistency issues and reduce compression losses. Our experiments on the nuScenes and Waymo Open datasets show that all variants of our approach qualitatively and quantitatively outperform prior approaches.
翻译:环境预测框架对于自动驾驶车辆(AV)在动态环境中的安全导航至关重要。激光雷达生成的占据栅格地图(L-OGM)为场景表示提供了鲁棒的鸟瞰视图,能够实现自监督的联合场景预测,同时对部分可观测性和感知检测失败表现出良好的适应性。先前的方法主要关注栅格单元空间内的确定性L-OGM预测架构。尽管这些方法取得了一定成功,但它们经常产生不现实的预测,并且未能有效捕捉环境的随机性。此外,它们未能有效整合自动驾驶车辆中存在的其他传感器模态。我们提出的框架——潜在占据预测(LOPR)——在生成式架构的潜在空间中执行随机性L-OGM预测,并允许以RGB摄像头、地图和规划轨迹作为条件输入。我们采用两种解码器进行预测解码:单步解码器能够实时提供高质量预测;基于扩散的批量解码器则可以进一步优化解码帧,以解决时间一致性问题并减少压缩损失。我们在nuScenes和Waymo Open数据集上的实验表明,我们方法的所有变体在定性和定量评估上均优于先前方法。