Human trajectory prediction is a practical task of predicting the future positions of pedestrians on the road, which typically covers all temporal ranges from short-term to long-term within a trajectory. However, existing works attempt to address the entire trajectory prediction with a singular, uniform training paradigm, neglecting the distinction between short-term and long-term dynamics in human trajectories. To overcome this limitation, we introduce a novel Progressive Pretext Task learning (PPT) framework, which progressively enhances the model's capacity of capturing short-term dynamics and long-term dependencies for the final entire trajectory prediction. Specifically, we elaborately design three stages of training tasks in the PPT framework. In the first stage, the model learns to comprehend the short-term dynamics through a stepwise next-position prediction task. In the second stage, the model is further enhanced to understand long-term dependencies through a destination prediction task. In the final stage, the model aims to address the entire future trajectory task by taking full advantage of the knowledge from previous stages. To alleviate the knowledge forgetting, we further apply a cross-task knowledge distillation. Additionally, we design a Transformer-based trajectory predictor, which is able to achieve highly efficient two-step reasoning by integrating a destination-driven prediction strategy and a group of learnable prompt embeddings. Extensive experiments on popular benchmarks have demonstrated that our proposed approach achieves state-of-the-art performance with high efficiency. Code is available at https://github.com/iSEE-Laboratory/PPT.
翻译:人体轨迹预测是一项预测行人在道路上未来位置的实用任务,通常涵盖轨迹内从短期到长期的所有时间范围。然而,现有工作试图采用单一、统一的训练范式来解决整个轨迹预测问题,忽略了人体轨迹中短期动态与长期依赖之间的区别。为克服这一局限,我们提出了一种新颖的渐进式前置任务学习框架,该框架逐步增强模型捕捉短期动态和长期依赖的能力,以完成最终的完整轨迹预测。具体而言,我们在PPT框架中精心设计了三个阶段的训练任务。在第一阶段,模型通过逐步的下一位置预测任务学习理解短期动态。在第二阶段,模型通过目的地预测任务进一步增强对长期依赖的理解。在最后阶段,模型旨在充分利用先前阶段获得的知识来解决完整的未来轨迹预测任务。为缓解知识遗忘问题,我们进一步应用了跨任务知识蒸馏。此外,我们设计了一种基于Transformer的轨迹预测器,该预测器通过整合目的地驱动的预测策略和一组可学习的提示嵌入,能够实现高效的两步推理。在多个主流基准数据集上的大量实验表明,我们提出的方法以高效率实现了最先进的性能。代码可在https://github.com/iSEE-Laboratory/PPT获取。