Accurately predicting how agents move in dynamic scenes is essential for safe autonomous driving. State-of-the-art motion forecasting models rely on datasets with manually annotated or post-processed trajectories. However, building these datasets is costly, generally manual, hard to scale, and lacks reproducibility. They also introduce domain gaps that limit generalization across environments. We introduce PPT (Pretraining with Pseudo-labeled Trajectories), a simple and scalable pretraining framework that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking. Unlike data annotation pipelines aiming for clean, single-label annotations, PPT is a pretraining framework embracing off-the-shelf trajectories as useful signals for learning robust representations. With optional finetuning on a small amount of labeled data, models pretrained with PPT achieve strong performance across standard benchmarks, particularly in low-data regimes, and in cross-domain, end-to-end, and multi-class settings. PPT is easy to implement and improves generalization in motion forecasting.
翻译:准确预测动态场景中智能体的运动轨迹对于安全自动驾驶至关重要。当前最先进的运动预测模型依赖于人工标注或后处理轨迹的数据集。然而,构建此类数据集成本高昂,通常依赖人工操作,难以规模化且缺乏可复现性。这些数据集还会引入领域差异,限制模型在不同环境中的泛化能力。本文提出PPT(基于伪标签轨迹的预训练)——一种简单可扩展的预训练框架,该框架利用现成3D检测器与跟踪器自动生成的未经处理的多样化轨迹数据。与追求干净单标签标注的数据标注流程不同,PPT作为一种预训练框架,将现成轨迹数据视为学习鲁棒表征的有效信号。通过对少量标注数据进行可选微调,基于PPT预训练的模型在标准基准测试中表现出优异性能,尤其在低数据量场景、跨领域、端到端及多类别设置中表现突出。PPT易于实现,能有效提升运动预测任务的泛化能力。