A common practice in deep learning consists of training large neural networks on massive datasets to perform accurately for different domains and tasks. While this methodology may work well in numerous application areas, it only applies across modalities due to a larger distribution shift in data captured using different sensors. This paper focuses on the problem of adapting a large object detection model to one or multiple modalities while being efficient. To do so, we propose ModTr as an alternative to the common approach of fine-tuning large models. ModTr consists of adapting the input with a small transformation network trained to minimize the detection loss directly. The original model can therefore work on the translated inputs without any further change or fine-tuning to its parameters. Experimental results on translating from IR to RGB images on two well-known datasets show that this simple ModTr approach provides detectors that can perform comparably or better than the standard fine-tuning without forgetting the original knowledge. This opens the doors to a more flexible and efficient service-based detection pipeline in which, instead of using a different detector for each modality, a unique and unaltered server is constantly running, where multiple modalities with the corresponding translations can query it. Code: https://github.com/heitorrapela/ModTr.
翻译:在深度学习中,常见做法是在大规模数据集上训练大型神经网络以在不同领域和任务中实现准确性能。虽然这种方法在众多应用场景中效果良好,但由于不同传感器采集的数据存在较大分布偏移,它仅适用于跨模态场景。本文聚焦于将大型目标检测模型高效适配至一种或多种模态的问题。为此,我们提出ModTr作为微调大型模型常见方法的一种替代方案。ModTr通过训练一个小型变换网络直接最小化检测损失来适配输入,从而原始模型无需修改或微调其参数即可处理变换后的输入。在两个知名数据集上从红外图像到RGB图像的迁移实验结果表明,这种简单的ModTr方法能使得检测器在性能上与标准微调相当或更优,且不会遗忘原始知识。这为构建更灵活高效的基于服务的检测流水线打开了大门:无需为每个模态使用不同的检测器,而是保持一个唯一且未经修改的持续运行的服务器,多个模态通过对应的变换可对其进行查询。代码地址:https://github.com/heitorrapela/ModTr