Motion prediction has been an essential component of autonomous driving systems since it handles highly uncertain and complex scenarios involving moving agents of different types. In this paper, we propose a Multi-Granular TRansformer (MGTR) framework, an encoder-decoder network that exploits context features in different granularities for different kinds of traffic agents. To further enhance MGTR's capabilities, we leverage LiDAR point cloud data by incorporating LiDAR semantic features from an off-the-shelf LiDAR feature extractor. We evaluate MGTR on Waymo Open Dataset motion prediction benchmark and show that the proposed method achieved state-of-the-art performance, ranking 1st on its leaderboard (https://waymo.com/open/challenges/2023/motion-prediction/).
翻译:运动预测是自动驾驶系统中的核心组件,因其需处理涉及不同类型运动代理的高度不确定且复杂的场景。本文提出多粒度Transformer(MGTR)框架——一种编码器-解码器网络,通过为不同类别的交通代理提取不同粒度的上下文特征来提升预测性能。为进一步增强MGTR能力,我们利用现成的LiDAR特征提取器获取语义特征,从而融合激光雷达点云数据。在Waymo开放数据集运动预测基准上的评估表明,所提方法取得了最优性能,位列该榜单首位(https://waymo.com/open/challenges/2023/motion-prediction/)。