Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection

The primary bottleneck towards obtaining good recognition performance in IR images is the lack of sufficient labeled training data, owing to the cost of acquiring such data. Realizing that object detection methods for the RGB modality are quite robust (at least for some commonplace classes, like person, car, etc.), thanks to the giant training sets that exist, in this work we seek to leverage cues from the RGB modality to scale object detectors to the IR modality, while preserving model performance in the RGB modality. At the core of our method, is a novel tensor decomposition method called TensorFact which splits the convolution kernels of a layer of a Convolutional Neural Network (CNN) into low-rank factor matrices, with fewer parameters than the original CNN. We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting, while encouraging them to capture complementary cues from those trained only on the RGB modality. We validate our approach empirically by first assessing how well our TensorFact decomposed network performs at the task of detecting objects in RGB images vis-a-vis the original network and then look at how well it adapts to IR images of the FLIR ADAS v1 dataset. For the latter, we train models under scenarios that pose challenges stemming from data paucity. From the experiments, we observe that: (i) TensorFact shows performance gains on RGB images; (ii) further, this pre-trained model, when fine-tuned, outperforms a standard state-of-the-art object detector on the FLIR ADAS v1 dataset by about 4% in terms of mAP 50 score.

翻译：红外图像中获取良好识别性能的主要瓶颈在于缺乏足够的标注训练数据，这是由于此类数据的获取成本较高。考虑到基于RGB模态的目标检测方法（至少对某些常见类别如行人、汽车等）因存在庞大训练集而相当稳健，本研究旨在利用RGB模态的线索将目标检测器扩展至红外模态，同时保持RGB模态的模型性能。我们的方法核心是一种名为TensorFact的新型张量分解方法，它将卷积神经网络（CNN）某一层的卷积核分解为低秩因子矩阵，其参数量少于原始CNN。我们首先在假定存在大量训练数据的RGB模态上预训练这些因子矩阵，然后仅增加少量可训练参数用于红外模态训练以避免过拟合，同时促使这些参数从仅在RGB模态上训练的模型捕获互补线索。通过以下两种方式实证验证我们的方法：首先评估TensorFact分解网络相较于原始网络在RGB图像目标检测任务中的表现，然后考察其对FLIR ADAS v1数据集红外图像的适配能力。针对后者，我们在数据匮乏的挑战性场景下训练模型。实验结果表明：（1）TensorFact在RGB图像上实现了性能提升；（2）进一步地，该预训练模型经微调后，在FLIR ADAS v1数据集上的mAP 50指标上比标准最先进目标检测器高出约4%。