Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect and track from sensor data (cameras or LiDARs) the past trajectories of the different elements of the scene and predict their future locations. We depart from the current trend of tackling this task via end-to-end training from perception to forecasting, and instead use a modular approach. We individually build and train detection, tracking and forecasting modules. We then only use consecutive finetuning steps to integrate the modules better and alleviate compounding errors. We conduct an in-depth study on the finetuning strategies and it reveals that our simple yet effective approach significantly improves performance on the end-to-end forecasting benchmark. Consequently, our solution ranks first in the Argoverse 2 End-to-end Forecasting Challenge, with 63.82 mAPf. We surpass forecasting results by +17.1 points over last year's winner and by +13.3 points over this year's runner-up. This remarkable performance in forecasting can be explained by our modular paradigm, which integrates finetuning strategies and significantly outperforms the end-to-end-trained counterparts. The code, model weights and results are made available https://github.com/valeoai/valeo4cast.
翻译:运动预测在自动驾驶系统中至关重要,用于预测周围智能体(如行人、车辆和交通信号)的未来轨迹。在端到端预测中,模型必须从传感器数据(摄像头或激光雷达)中联合检测并跟踪场景中不同元素的过去轨迹,并预测其未来位置。我们摒弃了当前通过从感知到预测的端到端训练来处理此任务的趋势,转而采用模块化方法。我们分别构建并训练检测、跟踪和预测模块,随后仅通过连续的微调步骤来更好地集成这些模块并减轻误差累积。我们对微调策略进行了深入研究,结果表明,这种简单而有效的方法显著提升了端到端预测基准的性能。因此,我们的解决方案在Argoverse 2端到端预测挑战赛中排名第一,获得了63.82 mAPf的分数。我们的预测结果比去年的优胜者高出+17.1分,比今年的亚军高出+13.3分。这种卓越的预测性能可归因于我们的模块化范式,它结合了微调策略,显著优于端到端训练的对应方法。代码、模型权重和结果已公开:https://github.com/valeoai/valeo4cast。