Object detectors often experience a drop in performance when new environmental conditions are insufficiently represented in the training data. This paper studies how to automatically fine-tune a pre-existing object detector while exploring and acquiring images in a new environment without relying on human intervention, i.e., in a self-supervised fashion. In our setting, an agent initially explores the environment using a pre-trained off-the-shelf detector to locate objects and associate pseudo-labels. By assuming that pseudo-labels for the same object must be consistent across different views, we devise a novel mechanism for producing refined predictions from the consensus among observations. Our approach improves the off-the-shelf object detector by 2.66% in terms of mAP and outperforms the current state of the art without relying on ground-truth annotations.
翻译:目标检测器在训练数据中未能充分反映新环境条件时,其性能通常会下降。本文研究如何在无需人工干预的情况下(即自监督方式),通过在新环境中探索并获取图像来自动微调预训练的目标检测器。在我们的设定中,智能体首先使用预训练的现成检测器探索环境,以定位目标并关联伪标签。通过假设同一目标的伪标签在不同视角间必须保持一致,我们设计了一种新颖机制,能够从观测结果的一致性中生成精炼预测。该方法在不依赖真实标注的情况下,将现成目标检测器的平均精度(mAP)提升了2.66%,并超越了当前最先进技术。