Event cameras provide low-latency perception for only milliwatts of power. This makes them highly suitable for resource-restricted, agile robots such as small flying drones. Self-supervised learning based on contrast maximization holds great potential for event-based robot vision, as it foregoes the need to high-frequency ground truth and allows for online learning in the robot's operational environment. However, online, onboard learning raises the major challenge of achieving sufficient computational efficiency for real-time learning, while maintaining competitive visual perception performance. In this work, we improve the time and memory efficiency of the contrast maximization learning pipeline. Benchmarking experiments show that the proposed pipeline achieves competitive results with the state of the art on the task of depth estimation from events. Furthermore, we demonstrate the usability of the learned depth for obstacle avoidance through real-world flight experiments. Finally, we compare the performance of different combinations of pre-training and fine-tuning of the depth estimation networks, showing that on-board domain adaptation is feasible given a few minutes of flight.
翻译:事件相机仅需毫瓦级功耗即可实现低延迟感知,这使其特别适用于资源受限的敏捷机器人,如小型飞行无人机。基于对比最大化的自监督学习方法为基于事件的机器人视觉提供了巨大潜力,因其无需高频真值标注,并允许在机器人的操作环境中进行在线学习。然而,在线、机载学习面临的主要挑战在于:在保持具有竞争力的视觉感知性能的同时,实现足以支持实时学习的计算效率。本研究改进了对比最大化学习流程的时间与内存效率。基准实验表明,所提出的流程在基于事件的深度估计任务上取得了与当前最优方法相当的结果。此外,我们通过真实飞行实验验证了学习所得深度信息在避障任务中的实用性。最后,我们比较了深度估计网络不同预训练与微调组合的性能,证明仅需数分钟飞行数据即可实现有效的机载域适应。