Navigating complex and dynamic environments requires autonomous vehicles (AVs) to reason about both visible and occluded regions. This involves predicting the future motion of observed agents, inferring occluded ones, and modeling their interactions based on vectorized scene representations of the partially observable environment. However, prior work on occlusion inference and trajectory prediction have developed in isolation, with the former based on simplified rasterized methods and the latter assuming full environment observability. We introduce the Scene Informer, a unified approach for predicting both observed agent trajectories and inferring occlusions in a partially observable setting. It uses a transformer to aggregate various input modalities and facilitate selective queries on occlusions that might intersect with the AV's planned path. The framework estimates occupancy probabilities and likely trajectories for occlusions, as well as forecast motion for observed agents. We explore common observability assumptions in both domains and their performance impact. Our approach outperforms existing methods in both occupancy prediction and trajectory prediction in partially observable setting on the Waymo Open Motion Dataset.
翻译:在复杂动态环境中导航要求自动驾驶车辆(AV)同时推理可见区域和遮挡区域。这包括预测观测到的智能体的未来运动、推断被遮挡的智能体,并基于部分可观测环境的矢量化场景表征建模其相互作用。然而,先前关于遮挡推断和轨迹预测的研究各自独立发展,前者基于简化的栅格化方法,后者则假设环境完全可观测。我们提出场景信息者(Scene Informer),一种统一方法,可在部分可观测条件下同时预测观测到的智能体轨迹并推断遮挡。它采用Transformer聚合多种输入模态,并针对可能干扰AV规划路径的遮挡进行选择性查询。该框架为遮挡区域估计占用概率和可能轨迹,同时预测观测智能体的运动。我们探究了两个领域中常见的可观测性假设及其性能影响。在Waymo开放运动数据集的部分可观测设置下,我们的方法在占用预测和轨迹预测方面均优于现有方法。