3D semantic occupancy prediction is an emerging perception paradigm in autonomous driving, providing a voxel-level representation of both geometric details and semantic categories. However, its effectiveness is inherently constrained in single-vehicle setups by occlusions, restricted sensor range, and narrow viewpoints. To address these limitations, collaborative perception enables the exchange of complementary information, thereby enhancing the completeness and accuracy of predictions. Despite its potential, research on collaborative 3D semantic occupancy prediction is hindered by the lack of dedicated datasets. To bridge this gap, we design a high-resolution semantic voxel sensor in CARLA to produce dense and comprehensive annotations. We further develop a baseline model that performs inter-agent feature fusion via spatial alignment and attention aggregation. In addition, we establish benchmarks with varying prediction ranges designed to systematically assess the impact of spatial extent on collaborative prediction. Experimental results demonstrate the superior performance of our baseline, with increasing gains observed as range expands. Our code is available at https://github.com/tlab-wide/Co3SOP}{https://github.com/tlab-wide/Co3SOP.
翻译:三维语义占据预测是自动驾驶领域新兴的感知范式,可提供兼具几何细节与语义类别的体素级表征。然而,在单车配置下,其效能固有地受限于遮挡、传感器范围受限及视角狭窄等问题。为克服这些局限,协作式感知通过交换互补信息,从而提升预测的完整性与准确性。尽管潜力显著,协作式三维语义占据预测的研究因缺乏专用数据集而受到制约。为填补这一空白,我们在CARLA中设计了一种高分辨率语义体素传感器,以生成密集且全面的标注。我们进一步开发了一种基线模型,该模型通过空间对齐与注意力聚合实现智能体间的特征融合。此外,我们建立了具有不同预测范围的基准,旨在系统评估空间范围对协作预测的影响。实验结果表明,我们的基线模型性能优越,且随着预测范围的扩大,性能增益持续增加。代码发布于https://github.com/tlab-wide/Co3SOP。