Many real-world applications require recognition models that are robust to different operational conditions and modalities, but at the same time run on small embedded devices, with limited hardware. While for normal size models, pre-training is known to be very beneficial in accuracy and robustness, for small models, that can be employed for embedded and edge devices, its effect is not clear. In this work, we investigate the effect of ImageNet pretraining on increasingly small backbone architectures (ultra-small models, with less than 1M parameters) with respect to robustness in downstream object detection tasks in the infrared visual modality. Using scaling laws derived from standard object recognition architectures, we construct two ultra-small backbone families and systematically study their performance. Our experiments on three different datasets reveal that while ImageNet pre-training is still useful, beyond a certain capacity threshold, it offers diminishing returns in terms of out-of-distribution detection robustness. Therefore, we advise practitioners to still use pre-training and, when possible avoid too small models as while they might work well for in-domain problems, they are brittle when working conditions are different.
翻译:许多实际应用要求识别模型能够在不同的操作条件和模态下保持鲁棒性,同时能够在硬件资源有限的小型嵌入式设备上运行。对于常规尺寸的模型,预训练已被证实对提升精度和鲁棒性非常有益;然而,对于可部署于嵌入式及边缘设备的小型模型,其效果尚不明确。在本工作中,我们研究了ImageNet预训练对日益缩小(参数量少于100万)的主干网络架构在下游红外视觉模态目标检测任务中鲁棒性的影响。基于从标准目标识别架构推导出的缩放定律,我们构建了两个超小型主干网络系列,并系统性地研究了它们的性能。我们在三个不同数据集上的实验表明,尽管ImageNet预训练仍然有效,但超过一定的容量阈值后,其在分布外检测鲁棒性方面的收益将逐渐递减。因此,我们建议实践者仍应使用预训练,并在可能的情况下避免使用过小的模型,因为这类模型虽然在域内问题上可能表现良好,但在工作条件变化时往往较为脆弱。