Trajectory prediction with uncertainty is a critical and challenging task for autonomous driving. Nowadays, we can easily access sensor data represented in multiple views. However, cross-view consistency has not been evaluated by the existing models, which might lead to divergences between the multimodal predictions from different views. It is not practical and effective when the network does not comprehend the 3D scene, which could cause the downstream module in a dilemma. Instead, we predicts multimodal trajectories while maintaining cross-view consistency. We presented a cross-view trajectory prediction method using shared 3D Queries (XVTP3D). We employ a set of 3D queries shared across views to generate multi-goals that are cross-view consistent. We also proposed a random mask method and coarse-to-fine cross-attention to capture robust cross-view features. As far as we know, this is the first work that introduces the outstanding top-down paradigm in BEV detection field to a trajectory prediction problem. The results of experiments on two publicly available datasets show that XVTP3D achieved state-of-the-art performance with consistent cross-view predictions.
翻译:不确定性轨迹预测是自动驾驶中一项关键且具有挑战性的任务。如今,我们可以轻松获取以多视角形式表示的传感器数据。然而,现有模型尚未评估跨视角一致性,这可能导致不同视角的多模态预测出现偏差。当网络无法理解3D场景时,其预测既不实用也缺乏有效性,往往使下游模块陷入两难境地。为此,本文在多模态轨迹预测的同时保持跨视角一致性。我们提出了一种基于共享3D查询的跨视角轨迹预测方法(XVTP3D),利用跨视图共享的一组3D查询生成具有跨视角一致性的多目标。我们还提出了随机掩码方法和从粗到细的交叉注意力机制,以捕获鲁棒的跨视角特征。据我们所知,这是首次将鸟瞰图(BEV)检测领域中杰出的自上而下范式引入轨迹预测问题。在两大公开数据集上的实验结果表明,XVTP3D通过一致的跨视角预测实现了最先进的性能。