Occupancy maps are widely recognized as an efficient method for facilitating robot motion planning in static environments. However, for intelligent vehicles, occupancy of both the present and future moments is required to ensure safe driving. In the automotive industry, the accurate and continuous prediction of future occupancy maps in traffic scenarios remains a formidable challenge. This paper investigates multi-sensor spatio-temporal fusion strategies for continuous occupancy prediction in a systematic manner. This paper presents FusionMotion, a novel bird's eye view (BEV) occupancy predictor which is capable of achieving the fusion of asynchronous multi-sensor data and predicting the future occupancy map with variable time intervals and temporal horizons. Remarkably, FusionMotion features the adoption of neural ordinary differential equations on recurrent neural networks for occupancy prediction. FusionMotion learns derivatives of BEV features over temporal horizons, updates the implicit sensor's BEV feature measurements and propagates future states for each ODE step. Extensive experiments on large-scale nuScenes and Lyft L5 datasets demonstrate that FusionMotion significantly outperforms previous methods. In addition, it outperforms the BEVFusion-style fusion strategy on the Lyft L5 dataset while reducing synchronization requirements. Codes and models will be made available.
翻译:占用地图被广泛认为是促进机器人在静态环境中进行运动规划的有效方法。然而,对于智能车辆而言,为确保安全驾驶,需要同时获取当前和未来时刻的占用信息。在汽车工业中,准确且连续地预测交通场景中的未来占用地图仍是一项严峻挑战。本文系统性地研究了用于连续占用预测的多传感器时空融合策略。本文提出FusionMotion,一种新颖的鸟瞰图(BEV)占用预测器,能够实现异步多传感器数据的融合,并以可变时间间隔和时间跨度预测未来占用地图。值得注意的是,FusionMotion在循环神经网络上采用神经常微分方程进行占用预测。FusionMotion学习BEV特征随时间跨度的导数,更新隐式传感器的BEV特征测量值,并在每个ODE步骤中传播未来状态。在大型nuScenes和Lyft L5数据集上的广泛实验表明,FusionMotion显著优于先前方法。此外,在Lyft L5数据集上,它在降低同步要求的同时,性能优于BEVFusion风格的融合策略。代码和模型将公开提供。