We tackle the problem of localizing the traffic surveillance cameras in cooperative perception. To overcome the lack of large-scale real-world intersection datasets, we introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla. Moreover, we introduce a novel neural network, TrafficLoc, localizing traffic cameras within a 3D reference map. TrafficLoc employs a coarse-to-fine matching pipeline. For image-point cloud feature fusion, we propose a novel Geometry-guided Attention Loss to address cross-modal viewpoint inconsistencies. During coarse matching, we propose an Inter-Intra Contrastive Learning to achieve precise alignment while preserving distinctiveness among local intra-features within image patch-point group pairs. Besides, we introduce Dense Training Alignment with a soft-argmax operator to consider additional features when regressing the final position. Extensive experiments show that our TrafficLoc improves the localization accuracy over the state-of-the-art Image-to-point cloud registration methods by a large margin (up to 86%) on Carla Intersection and generalizes well to real-world data. TrafficLoc also achieves new SOTA performance on KITTI and NuScenes datasets, demonstrating strong localization ability across both in-vehicle and traffic cameras. Our project page is publicly available at https://tum-luk.github.io/projects/trafficloc/.
翻译:本文研究协同感知中的交通监控摄像头定位问题。为克服大规模真实世界交叉路口数据集匮乏的局限,我们提出Carla Intersection——一个在Carla仿真平台中构建的包含75个城乡交叉路口的新型模拟数据集。此外,我们提出新颖的神经网络TrafficLoc,可在三维参考地图中实现交通摄像头定位。TrafficLoc采用由粗到精的匹配流程:针对图像-点云特征融合,我们提出几何引导注意力损失函数以解决跨模态视角不一致问题;在粗匹配阶段,提出内外对比学习方法,在保持图像块-点群对局部内部特征独特性的同时实现精确对齐;同时引入带软最大值参数算子的密集训练对齐机制,在回归最终位置时纳入更多特征。大量实验表明,TrafficLoc在Carla Intersection数据集上将图像-点云配准方法的定位精度显著提升(最高达86%),并在真实数据上展现良好泛化能力。该模型在KITTI和NuScenes数据集上也取得最新最优性能,证明了其在车载摄像头与交通监控摄像头场景中均具备强大的定位能力。项目页面已公开于https://tum-luk.github.io/projects/trafficloc/。