The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants. Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. However, they suffer from high computational overhead and poor scalability as the number of agents to be predicted increases. To address this problem, we introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. Then, based on KNARPE we present the Heterogeneous Polyline Transformer with Relative pose encoding (HPTR), a hierarchical framework enabling asynchronous token update during the online inference. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods. Experiments on Waymo and Argoverse-2 datasets show that HPTR achieves superior performance among end-to-end methods that do not apply expensive post-processing or model ensembling. The code is available at https://github.com/zhejz/HPTR.
翻译:自动驾驶系统的实际部署要求其组件能够在车载环境中实时运行,包括用于预测周围交通参与者未来轨迹的运动预测模块。现有以智能体为中心的方法在公开基准测试中展现出卓越性能,但随着待预测智能体数量的增加,这类方法存在计算开销高、可扩展性差的问题。为解决该问题,我们提出具有相对位姿编码的K近邻注意力机制(KNARPE),这是一种新颖的注意力机制,允许变换器使用成对相对表示。基于KNARPE,我们进一步提出具有相对位姿编码的异构折线变换器(HPTR),这是一种支持在线推理过程中异步令牌更新的层次化框架。通过共享智能体间的上下文信息并复用未改变的上下文,本方法兼具场景中心方法的计算效率,同时性能与最先进的智能体中心方法相当。在Waymo和Argoverse-2数据集上的实验表明,HPTR在不采用昂贵后处理或模型集成的情况下,在端到端方法中实现了优越性能。代码已开源至https://github.com/zhejz/HPTR。