Outside of urban hubs, autonomous cars and trucks have to master driving on intercity highways. Safe, long-distance highway travel at speeds exceeding 100 km/h demands perception distances of at least 250 m, which is about five times the 50-100m typically addressed in city driving, to allow sufficient planning and braking margins. Increasing the perception ranges also allows to extend autonomy from light two-ton passenger vehicles to large-scale forty-ton trucks, which need a longer planning horizon due to their high inertia. However, most existing perception approaches focus on shorter ranges and rely on Bird's Eye View (BEV) representations, which incur quadratic increases in memory and compute costs as distance grows. To overcome this limitation, we built on top of a sparse representation and introduced an efficient 3D encoding of multi-modal and temporal features, along with a novel self-supervised pre-training scheme that enables large-scale learning from unlabeled camera-LiDAR data. Our approach extends perception distances to 250 meters and achieves an 26.6% improvement in mAP in object detection and a decrease of 30.5% in Chamfer Distance in LiDAR forecasting compared to existing methods, reaching distances up to 250 meters. Project Page: https://light.princeton.edu/lrs4fusion/
翻译:在城市中心之外,自动驾驶汽车和卡车必须掌握城际高速公路上的驾驶能力。为确保以超过100公里/小时的速度安全进行长距离高速公路行驶,需要至少250米的感知距离,这大约是城市驾驶中通常处理的50-100米距离的五倍,以便提供充分的规划与制动裕度。增加感知范围还能将自主驾驶能力从轻型的2吨乘用车扩展到大型的40吨卡车,后者由于高惯性需要更长的规划视界。然而,现有的大多数感知方法聚焦于较短距离,并依赖于鸟瞰图表示,这种表示会随着距离增加导致内存和计算成本呈二次方增长。为克服这一限制,我们在稀疏表示的基础上构建了一种高效的多模态与时序特征三维编码方法,并引入了一种新颖的自监督预训练方案,该方案能够从无标注的相机-LiDAR数据中进行大规模学习。我们的方法将感知距离扩展至250米,与现有方法相比,在物体检测中实现了26.6%的平均精度提升,并在LiDAR预测中将Chamfer距离降低了30.5%,感知距离可达250米。项目页面:https://light.princeton.edu/lrs4fusion/