Navigating complex and dynamic environments requires autonomous vehicles (AVs) to reason about both visible and occluded regions. This involves predicting the future motion of observed agents, inferring occluded ones, and modeling their interactions based on vectorized scene representations of the partially observable environment. However, prior work on occlusion inference and trajectory prediction have developed in isolation, with the former based on simplified rasterized methods and the latter assuming full environment observability. We introduce the Scene Informer, a unified approach for predicting both observed agent trajectories and inferring occlusions in a partially observable setting. It uses a transformer to aggregate various input modalities and facilitate selective queries on occlusions that might intersect with the AV's planned path. The framework estimates occupancy probabilities and likely trajectories for occlusions, as well as forecast motion for observed agents. We explore common observability assumptions in both domains and their performance impact. Our approach outperforms existing methods in both occupancy prediction and trajectory prediction in partially observable setting on the Waymo Open Motion Dataset.
翻译:在复杂动态环境中导航需要自动驾驶汽车(AV)同时推理可见区域与遮挡区域,这包括预测观测到智能体的未来运动、推断遮挡区域内的智能体,以及基于部分可观测环境的矢量化场景表征建模其交互。然而,现有遮挡推断与轨迹预测研究相互独立发展,前者基于简化的栅格化方法,后者则假设环境完全可观测。我们提出场景信息器(Scene Informer),一种在部分可观测场景下统一实现观测智能体轨迹预测与遮挡推断的方法。该方法采用Transformer聚合多模态输入,并支持对可能与AV规划路径相交的遮挡区域进行选择性查询。框架可估计遮挡区域的占用概率与可能轨迹,同时预测观测智能体的运动。我们探讨了两类任务中常见的可观测性假设及其性能影响。在Waymo开放运动数据集的部分可观测场景下,本方法在占用预测与轨迹预测任务上均优于现有方法。