Collaborative perception in automated vehicles leverages the exchange of information between agents, aiming to elevate perception results. Previous camera-based collaborative 3D perception methods typically employ 3D bounding boxes or bird's eye views as representations of the environment. However, these approaches fall short in offering a comprehensive 3D environmental prediction. To bridge this gap, we introduce the first method for collaborative 3D semantic occupancy prediction. Particularly, it improves local 3D semantic occupancy predictions by hybrid fusion of (i) semantic and occupancy task features, and (ii) compressed orthogonal attention features shared between vehicles. Additionally, due to the lack of a collaborative perception dataset designed for semantic occupancy prediction, we augment a current collaborative perception dataset to include 3D collaborative semantic occupancy labels for a more robust evaluation. The experimental findings highlight that: (i) our collaborative semantic occupancy predictions excel above the results from single vehicles by over 30%, and (ii) models anchored on semantic occupancy outpace state-of-the-art collaborative 3D detection techniques in subsequent perception applications, showcasing enhanced accuracy and enriched semantic-awareness in road environments.
翻译:协同感知技术通过车辆间信息交互提升自动驾驶感知性能。现有基于相机的协同三维感知方法通常采用三维边界框或鸟瞰图作为环境表征,但无法实现全面的三维环境预测。为弥补这一不足,我们首次提出协同三维语义占据预测方法。该方法通过混合融合(i)语义与占据任务特征,以及(ii)车辆间共享的压缩正交注意力特征,实现局部三维语义占据预测的优化。针对当前缺乏面向语义占据预测的协同感知数据集问题,我们对现有协同感知数据集进行扩展,添加三维协同语义占据标签以构建更稳健的评估基准。实验结果表明:(i)本方法协同语义占据预测结果相较单车方案提升超过30%;(ii)基于语义占据的模型在后续感知应用中超越当前最优的协同三维检测技术,展现出道路环境感知精度与语义感知能力的显著提升。