Navigating complex and dynamic environments requires autonomous vehicles (AVs) to reason about both visible and occluded regions. This involves predicting the future motion of observed agents, inferring occluded ones, and modeling their interactions based on vectorized scene representations of the partially observable environment. However, prior work on occlusion inference and trajectory prediction have developed in isolation, with the former based on simplified rasterized methods and the latter assuming full environment observability. We introduce the Scene Informer, a unified approach for predicting both observed agent trajectories and inferring occlusions in a partially observable setting. It uses a transformer to aggregate various input modalities and facilitate selective queries on occlusions that might intersect with the AV's planned path. The framework estimates occupancy probabilities and likely trajectories for occlusions, as well as forecast motion for observed agents. We explore common observability assumptions in both domains and their performance impact. Our approach outperforms existing methods in both occupancy prediction and trajectory prediction in partially observable setting on the Waymo Open Motion Dataset.
翻译:在复杂动态环境中导航需要自动驾驶车辆(AV)推理可见区域与遮挡区域。这涉及预测已观测智能体的未来运动、推断遮挡智能体,并基于部分可观测环境的矢量化场景表征建模其交互影响。然而,先前关于遮挡推断与轨迹预测的研究相互孤立发展,前者基于简化栅格化方法,后者假设环境完全可观测。我们提出场景信息者(Scene Informer),一种在部分可观测场景中统一预测观测智能体轨迹与推断遮挡的统一方法。该方法采用Transformer聚合多模态输入,并支持对可能与AV规划路径相交的遮挡区域进行选择性查询。该框架能够估计遮挡区域的占用概率与可能轨迹,同时预测观测智能体的运动。我们探讨了这两个领域中常见的可观测性假设及其性能影响。在Waymo开放运动数据集的部分可观测场景中,我们的方法在占用预测与轨迹预测方面均优于现有方法。