We propose a novel self-supervised approach for learning to visually localize robots equipped with controllable LEDs. We rely on a few training samples labeled with position ground truth and many training samples in which only the LED state is known, whose collection is cheap. We show that using LED state prediction as a pretext task significantly helps to learn the visual localization end task. The resulting model does not require knowledge of LED states during inference. We instantiate the approach to visual relative localization of nano-quadrotors: experimental results show that using our pretext task significantly improves localization accuracy (from 68.3% to 76.2%) and outperforms alternative strategies, such as a supervised baseline, model pre-training, and an autoencoding pretext task. We deploy our model aboard a 27-g Crazyflie nano-drone, running at 21 fps, in a position-tracking task of a peer nano-drone. Our approach, relying on position labels for only 300 images, yields a mean tracking error of 4.2 cm versus 11.9 cm of a supervised baseline model trained without our pretext task. Videos and code of the proposed approach are available at https://github.com/idsia-robotics/leds-as-pretext
翻译:我们提出了一种新颖的自监督方法,用于学习配备可控LED的机器人视觉定位。该方法仅依赖少量带有真实位置标注的训练样本,以及大量仅需知晓LED状态(收集成本低廉)的训练样本。研究表明,将LED状态预测作为前置任务能显著提升视觉定位最终任务的学习效果,且推理阶段无需LED状态信息。我们将该方法应用于纳米四旋翼飞行器的视觉相对定位:实验结果表明,采用该前置任务使定位准确率从68.3%提升至76.2%,性能优于监督基线模型、模型预训练及自编码前置任务等替代策略。我们在27克重的Crazyflie纳米无人机上部署了该模型,以21帧/秒的帧率执行同伴纳米无人机的位置追踪任务。该方法仅依赖300张带位置标注的图像,平均追踪误差仅为4.2厘米,而未经前置任务训练的监督基线模型误差达11.9厘米。论文视频及代码开源地址:https://github.com/idsia-robotics/leds-as-pretext