The state of the art in 3D object detection using sensor fusion heavily relies on calibration quality, which is difficult to maintain in large scale deployment outside a lab environment. We present the first calibration-free approach for 3D object detection. Thus, eliminating the need for complex and costly calibration procedures. Our approach uses transformers to map the features between multiple views of different sensors at multiple abstraction levels. In an extensive evaluation for object detection, we not only show that our approach outperforms single modal setups by 14.1% in BEV mAP, but also that the transformer indeed learns mapping. By showing calibration is not necessary for sensor fusion, we hope to motivate other researchers following the direction of calibration-free fusion. Additionally, resulting approaches have a substantial resilience against rotation and translation changes.
翻译:当前使用传感器融合的3D目标检测技术高度依赖校准质量,这在大规模实际部署中难以维持。我们提出首个免校准的3D目标检测方法,从而消除了对复杂且昂贵的校准流程的需求。该方法通过Transformer将不同传感器多视角下的多层级特征进行映射。在广泛的目标检测评估中,我们不仅证明该方法在BEV mAP上相较单模态方案提升14.1%,还证实Transformer确实学会了特征映射。通过证明传感器融合无需校准,我们期望能激励其他研究者探索免校准融合的研究方向。此外,该方法对旋转和平移变化具有较强的鲁棒性。