In this paper we present YOLOX-ViT, a novel object detection model, and investigate the efficacy of knowledge distillation for model size reduction without sacrificing performance. Focused on underwater robotics, our research addresses key questions about the viability of smaller models and the impact of the visual transformer layer in YOLOX. Furthermore, we introduce a new side-scan sonar image dataset, and use it to evaluate our object detector's performance. Results show that knowledge distillation effectively reduces false positives in wall detection. Additionally, the introduced visual transformer layer significantly improves object detection accuracy in the underwater environment. The source code of the knowledge distillation in the YOLOX-ViT is at https://github.com/remaro-network/KD-YOLOX-ViT.
翻译:本文提出了一种新型目标检测模型YOLOX-ViT,并研究了知识蒸馏在保持性能的同时缩小模型规模的有效性。针对水下机器人应用,我们的研究探讨了小模型的可行性以及YOLOX中视觉变换器层的影响等关键问题。此外,我们引入了一个新的侧扫声纳图像数据集,并以此评估目标检测器的性能。结果表明,知识蒸馏有效减少了墙体检测中的误报率。同时,引入的视觉变换器层在水下环境中显著提升了目标检测精度。YOLOX-ViT中知识蒸馏的源代码位于https://github.com/remaro-network/KD-YOLOX-ViT。