Motion prediction has been an essential component of autonomous driving systems since it handles highly uncertain and complex scenarios involving moving agents of different types. In this paper, we propose a Multi-Granular TRansformer (MGTR) framework, an encoder-decoder network that exploits context features in different granularities for different kinds of traffic agents. To further enhance MGTR's capabilities, we leverage LiDAR point cloud data by incorporating LiDAR semantic features from an off-the-shelf LiDAR feature extractor. We evaluate MGTR on Waymo Open Dataset motion prediction benchmark and show that the proposed method achieved state-of-the-art performance, ranking 1st on its leaderboard (https://waymo.com/open/challenges/2023/motion-prediction/).
翻译:运动预测是自动驾驶系统中的关键组成部分,因为它需要处理涉及不同类型移动代理的高度不确定性和复杂场景。本文提出了一种多粒度Transformer(MGTR)框架,这是一个编码器-解码器网络,能够针对不同类型的交通代理,在不同粒度上利用上下文特征。为进一步增强MGTR的能力,我们通过集成现成LiDAR特征提取器提取的语义特征,充分利用了点云数据。我们在Waymo开放数据集运动预测基准上评估了MGTR,结果表明所提方法达到了业界领先水平,在该排行榜上排名第一(https://waymo.com/open/challenges/2023/motion-prediction/)。