3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings. To incorporate sequential inputs, most existing methods fuse representations from previous frames to infer the current 3D occupancy. However, they fail to consider the continuity of driving scenarios and ignore the strong prior provided by the evolution of 3D scenes (e.g., only dynamic objects move). In this paper, we propose a world-model-based framework to exploit the scene evolution for perception. We reformulate 3D occupancy prediction as a 4D occupancy forecasting problem conditioned on the current sensor input. We decompose the scene evolution into three factors: 1) ego motion alignment of static scenes; 2) local movements of dynamic objects; and 3) completion of newly-observed scenes. We then employ a Gaussian world model (GaussianWorld) to explicitly exploit these priors and infer the scene evolution in the 3D Gaussian space considering the current RGB observation. We evaluate the effectiveness of our framework on the widely used nuScenes dataset. Our GaussianWorld improves the performance of the single-frame counterpart by over 2% in mIoU without introducing additional computations. Code: https://github.com/zuosc19/GaussianWorld.
翻译:三维占据预测因其对周围环境的全面感知能力,在自动驾驶领域具有重要意义。为融合时序输入,现有方法大多通过融合先前帧的表示来推断当前三维占据状态。然而,这些方法未能考虑驾驶场景的连续性,且忽略了三维场景演化所提供的强先验信息(例如仅动态物体会发生位移)。本文提出一种基于世界模型的框架,以利用场景演化进行感知。我们将三维占据预测重新定义为基于当前传感器输入的四维占据预测问题。将场景演化分解为三个因素:1)静态场景的自车运动对齐;2)动态物体的局部位移;3)新观测场景的补全。随后采用高斯世界模型(GaussianWorld),在考虑当前RGB观测的前提下,于三维高斯空间中显式利用这些先验信息并推断场景演化过程。我们在广泛使用的nuScenes数据集上评估了框架的有效性。实验表明,GaussianWorld在不引入额外计算量的情况下,将单帧基准模型的mIoU性能提升了2%以上。代码地址:https://github.com/zuosc19/GaussianWorld。