When an object detector is deployed in a novel setting it often experiences a drop in performance. This paper studies how an embodied agent can automatically fine-tune a pre-existing object detector while exploring and acquiring images in a new environment without relying on human intervention, i.e., a fully self-supervised approach. In our setting, an agent initially learns to explore the environment using a pre-trained off-the-shelf detector to locate objects and associate pseudo-labels. By assuming that pseudo-labels for the same object must be consistent across different views, we learn the exploration policy Look Around to mine hard samples, and we devise a novel mechanism called Disagreement Reconciliation for producing refined pseudo-labels from the consensus among observations. We implement a unified benchmark of the current state-of-the-art and compare our approach with pre-existing exploration policies and perception mechanisms. Our method is shown to outperform existing approaches, improving the object detector by 6.2% in a simulated scenario, a 3.59% advancement over other state-of-the-art methods, and by 9.97% in the real robotic test without relying on ground-truth. Code for the proposed approach and baselines are available at https://iit-pavis.github.io/Look_Around_And_Learn/.
翻译:当目标检测器部署于新环境时,其性能常出现下降。本文研究具身智能体如何在无需人工干预(即完全自监督)的情况下,通过探索新环境并采集图像,对预训练目标检测器进行自动微调。在我们的设定中,智能体首先利用预训练的通用检测器探索环境,定位目标并关联伪标签。通过假设同一目标在不同视角下的伪标签必须保持一致,我们提出“环顾”探索策略以挖掘困难样本,并设计一种称为“分歧调和”的新机制,通过多视角观测的共识生成精细化伪标签。我们构建了当前最优方法的统一基准,并将所提方法与现有探索策略及感知机制进行比较。实验表明,我们的方法显著优于现有方案:在仿真场景中将目标检测器性能提升6.2%(较其他最优方法提升3.59%),在真实机器人测试中无需真实标注即实现9.97%的性能提升。所提方法及基线的代码发布于 https://iit-pavis.github.io/Look_Around_And_Learn/。