To handle the two shortcomings of existing methods, (i)nearly all models rely on high-definition (HD) maps, yet the map information is not always available in real traffic scenes and HD map-building is expensive and time-consuming and (ii) existing models usually focus on improving prediction accuracy at the expense of reducing computing efficiency, yet the efficiency is crucial for various real applications, this paper proposes an efficient trajectory prediction model that is not dependent on traffic maps. The core idea of our model is encoding single-agent's spatial-temporal information in the first stage and exploring multi-agents' spatial-temporal interactions in the second stage. By comprehensively utilizing attention mechanism, LSTM, graph convolution network and temporal transformer in the two stages, our model is able to learn rich dynamic and interaction information of all agents. Our model achieves the highest performance when comparing with existing map-free methods and also exceeds most map-based state-of-the-art methods on the Argoverse dataset. In addition, our model also exhibits a faster inference speed than the baseline methods.
翻译:为解决现有方法的两个缺陷:(1)几乎所有模型都依赖高精地图,但实际交通场景中地图信息并非总是可用,且构建高精地图成本高昂且耗时;(2)现有模型通常以牺牲计算效率为代价提高预测精度,而效率对各类实际应用至关重要,本文提出了一种不依赖交通地图的高效轨迹预测模型。该模型的核心思想是在第一阶段编码单个智能体的时空信息,并在第二阶段探索多智能体的时空交互。通过综合运用注意力机制、LSTM、图卷积网络和时间变换器,该模型能够学习所有智能体丰富的动态和交互信息。在与现有无地图方法的对比中,该模型在Argoverse数据集上达到了最高性能,且超越了多数基于地图的最先进方法。此外,该模型的推理速度也优于基线方法。