The flourishing success of Deep Neural Networks(DNNs) on RGB-input perception tasks has opened unbounded possibilities for non-RGB-input perception tasks, such as object detection from wireless signals, lidar scans, and infrared images. Compared to the matured development pipeline of RGB-input (source modality) models, developing non-RGB-input (target-modality) models from scratch poses excessive challenges in the modality-specific network design/training tricks and labor in the target-modality annotation. In this paper, we propose ModAlity Calibration (MAC), an efficient pipeline for calibrating target-modality inputs to the DNN object detection models developed on the RGB (source) modality. We compose a target-modality-input model by adding a small calibrator module ahead of a source-modality model and introduce MAC training techniques to impose dense supervision on the calibrator. By leveraging (1) prior knowledge synthesized from the source-modality model and (2) paired {target, source} data with zero manual annotations, our target-modality models reach comparable or better metrics than baseline models that require 100% manual annotations. We demonstrate the effectiveness of MAC by composing the WiFi-input, Lidar-input, and Thermal-Infrared-input models upon the pre-trained RGB-input models respectively.
翻译:深度神经网络(DNN)在基于RGB输入的感知任务中取得的巨大成功,为非RGB输入的感知任务(如基于无线信号、激光雷达扫描和红外图像的目标检测)开辟了无限可能。相较于成熟的RGB输入(源模态)模型开发流程,从零开始开发非RGB输入(目标模态)模型面临模态特定网络设计与训练技巧的过多挑战,以及目标模态标注所需的大量人力投入。本文提出了模态校准(MAC)方法,这是一种将目标模态输入校准至基于RGB(源模态)开发的DNN目标检测模型的高效流水线。通过在源模态模型前添加一个小型校准模块构建目标模态输入模型,并引入MAC训练技术对校准器实施密集监督。通过利用(1)从源模态模型合成的先验知识以及(2)无需人工标注的成对{目标模态,源模态}数据,我们的目标模态模型达到了需要100%人工标注的基准模型相当或更优的指标。我们通过分别基于预训练RGB输入模型构建WiFi输入、激光雷达输入和热红外输入模型,验证了MAC的有效性。