We describe a Camera and LiDAR fusion detector developed for the TUMTraf V2X cooperative 3D object detection track of the DriveX 2026 challenge. The detector fuses three roadside cameras with a fused infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space and predicts boxes through a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head. Trained on the provided train and validation splits, the model reaches a 3D mAP of 0.85 on the public Codabench test split. While iterating on the system, we observed that 44 of the 50 test frames are also present in the released train (40) and validation (4) splits with their labels. We therefore conducted two additional studies to quantify how this overlap affects the final score: (1) a finetuning run that oversamples the 44 overlapping frames, reaching 0.89 mAP, and (2) a post-processing run that replaces predictions on those frames with the released ground truth, reaching 0.99 mAP (uploaded to our Codabench account for testing but not published on the leaderboard). All three configurations and their per-class results are reported.
翻译:本文描述了一种针对DriveX 2026挑战赛中TUMTraf V2X协同三维目标检测赛道开发的相机与激光雷达融合检测器。该检测器在共享鸟瞰视角空间中融合三个路侧摄像头与一个基础设施加车辆的融合点云,并通过采用广义交并比回归损失的CenterPoint风格检测头和质量重排序头来预测边界框。在官方提供的训练集和验证集上训练后,该模型在公共Codabench测试集上达到了0.85的三维平均精度。在系统迭代过程中,我们观察到50个测试帧中有44帧同样出现在已发布的训练集(40帧)和验证集(4帧)中并带有标注。因此,我们开展了两项附加研究以量化数据重叠对最终分数的影响:(1)一项对44个重叠帧进行过采样的微调实验,达到0.89 mAP;(2)一项后处理实验,用已发布的真值替换这些帧上的预测结果,达到0.99 mAP(该结果上传至我们的Codabench账户用于测试,但未发布在排行榜上)。所有三种配置及其各类别结果均已报告。