SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. However, the scarcity of large-scale driving datasets has hindered the development of robust and generalizable motion prediction models, limiting their ability to capture complex interactions and road geometries. Inspired by recent advances in natural language processing (NLP) and computer vision (CV), self-supervised learning (SSL) has gained significant attention in the motion prediction community for learning rich and transferable scene representations. Nonetheless, existing pre-training methods for motion prediction have largely focused on specific model architectures and single dataset, limiting their scalability and generalizability. To address these challenges, we propose SmartPretrain, a general and scalable SSL framework for motion prediction that is both model-agnostic and dataset-agnostic. Our approach integrates contrastive and reconstructive SSL, leveraging the strengths of both generative and discriminative paradigms to effectively represent spatiotemporal evolution and interactions without imposing architectural constraints. Additionally, SmartPretrain employs a dataset-agnostic scenario sampling strategy that integrates multiple datasets, enhancing data volume, diversity, and robustness. Extensive experiments on multiple datasets demonstrate that SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics. For instance, SmartPretrain significantly reduces the MissRate of Forecast-MAE by 10.6%. These results highlight SmartPretrain's effectiveness as a unified, scalable solution for motion prediction, breaking free from the limitations of the small-data regime. Codes are available at https://github.com/youngzhou1999/SmartPretrain

翻译：预测周围智能体的未来运动对于自动驾驶车辆在动态的人机混合环境中安全运行至关重要。然而，大规模驾驶数据集的稀缺性阻碍了鲁棒且可泛化的运动预测模型的发展，限制了其捕捉复杂交互与道路几何结构的能力。受自然语言处理和计算机视觉领域近期进展的启发，自监督学习在运动预测社区中获得了广泛关注，用于学习丰富且可迁移的场景表示。然而，现有的运动预测预训练方法主要聚焦于特定模型架构和单一数据集，限制了其可扩展性与泛化能力。为应对这些挑战，我们提出了SmartPretrain，一个通用且可扩展的自监督学习框架，用于运动预测，该框架同时具备模型无关性与数据集无关性。我们的方法整合了对比式与重构式自监督学习，结合生成式与判别式范式的优势，在不施加架构约束的情况下有效表示时空演化与交互。此外，SmartPretrain采用了一种数据集无关的场景采样策略，整合多个数据集，从而提升了数据量、多样性与鲁棒性。在多个数据集上的大量实验表明，SmartPretrain能够持续提升最先进预测模型在不同数据集、数据划分和主要评估指标上的性能。例如，SmartPretrain将Forecast-MAE的MissRate显著降低了10.6%。这些结果凸显了SmartPretrain作为一个统一、可扩展的运动预测解决方案的有效性，使其能够摆脱小数据体制的限制。代码发布于 https://github.com/youngzhou1999/SmartPretrain