融合专用与通用智能体运动预测的动态占据栅格地图方法 (Integrating Specialized and Generic Agent Motion Prediction with Dynamic Occupancy Grid Maps)

Accurate prediction of driving scene is a challenging task due to uncertainty in sensor data, the complex behaviors of agents, and the possibility of multiple feasible futures. Existing prediction methods using occupancy grid maps primarily focus on agent-agnostic scene predictions, while agent-specific predictions provide specialized behavior insights with the help of semantic information. However, both paradigms face distinct limitations: agent-agnostic models struggle to capture the behavioral complexities of dynamic actors, whereas agent-specific approaches fail to generalize to poorly perceived or unrecognized agents; combining both enables robust and safer motion forecasting. To address this, we propose a unified framework by leveraging Dynamic Occupancy Grid Maps within a streamlined temporal decoding pipeline to simultaneously predict future occupancy state grids, vehicle grids, and scene flow grids. Relying on a lightweight spatiotemporal backbone, our approach is centered on a tailored, interdependent loss function that captures inter-grid dependencies and enables diverse future predictions. By using occupancy state information to enforce flow-guided transitions, the loss function acts as a regularizer that directs occupancy evolution while accounting for obstacles and occlusions. Consequently, the model not only predicts the specific behaviors of vehicle agents, but also identifies other dynamic entities and anticipates their evolution within the complex scene. Evaluations on real-world nuScenes and Woven Planet datasets demonstrate superior prediction performances for dynamic vehicles and generic dynamic scene elements compared to baseline methods.

翻译：驾驶场景的精确预测是一项具有挑战性的任务，这源于传感器数据的不确定性、智能体行为的复杂性以及多种可行未来轨迹的可能性。现有基于占据栅格地图的预测方法主要关注与智能体无关的场景预测，而基于智能体语义信息的专用预测则能提供针对特定行为模式的深入洞察。然而，这两种范式各自面临明显的局限性：与智能体无关的模型难以捕捉动态参与者的复杂行为模式，而专用智能体方法则难以泛化至感知不良或未被识别的智能体；将两者结合可实现更鲁棒且安全的运动预测。为此，我们提出一个统一框架，通过在简化的时序解码流程中利用动态占据栅格地图，同时预测未来占据状态栅格、车辆栅格与场景流栅格。基于轻量级时空骨干网络，我们的方法核心在于一个定制的、相互依存的损失函数，该函数能够捕捉栅格间的依赖关系并支持多样化的未来预测。通过利用占据状态信息强制执行流引导的状态转移，该损失函数充当正则化器，在考虑障碍物与遮挡的同时引导占据状态的演化。因此，该模型不仅能预测车辆智能体的特定行为，还能识别其他动态实体并预测其在复杂场景中的演化过程。在真实世界的 nuScenes 和 Woven Planet 数据集上的评估表明，相较于基线方法，本模型在动态车辆及通用动态场景元素的预测性能上均表现出显著优势。