MAC: ModAlity Calibration for Object Detection

The flourishing success of Deep Neural Networks(DNNs) on RGB-input perception tasks has opened unbounded possibilities for non-RGB-input perception tasks, such as object detection from wireless signals, lidar scans, and infrared images. Compared to the matured development pipeline of RGB-input (source modality) models, developing non-RGB-input (target-modality) models from scratch poses excessive challenges in the modality-specific network design/training tricks and labor in the target-modality annotation. In this paper, we propose ModAlity Calibration (MAC), an efficient pipeline for calibrating target-modality inputs to the DNN object detection models developed on the RGB (source) modality. We compose a target-modality-input model by adding a small calibrator module ahead of a source-modality model and introduce MAC training techniques to impose dense supervision on the calibrator. By leveraging (1) prior knowledge synthesized from the source-modality model and (2) paired {target, source} data with zero manual annotations, our target-modality models reach comparable or better metrics than baseline models that require 100% manual annotations. We demonstrate the effectiveness of MAC by composing the WiFi-input, Lidar-input, and Thermal-Infrared-input models upon the pre-trained RGB-input models respectively.

翻译：深度神经网络（DNN）在基于RGB输入的感知任务中取得的巨大成功，为非RGB输入的感知任务（如基于无线信号、激光雷达扫描和红外图像的目标检测）开辟了无限可能。相较于成熟的RGB输入（源模态）模型开发流程，从零开始开发非RGB输入（目标模态）模型面临模态特定网络设计与训练技巧的过多挑战，以及目标模态标注所需的大量人力投入。本文提出了模态校准（MAC）方法，这是一种将目标模态输入校准至基于RGB（源模态）开发的DNN目标检测模型的高效流水线。通过在源模态模型前添加一个小型校准模块构建目标模态输入模型，并引入MAC训练技术对校准器实施密集监督。通过利用（1）从源模态模型合成的先验知识以及（2）无需人工标注的成对{目标模态，源模态}数据，我们的目标模态模型达到了需要100%人工标注的基准模型相当或更优的指标。我们通过分别基于预训练RGB输入模型构建WiFi输入、激光雷达输入和热红外输入模型，验证了MAC的有效性。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日