Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Different from existing task-specific surveys, this review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities. We start by elucidating the fundamental principles of vision-based learning, highlighting how it significantly improves drones' visual perception and decision-making processes. We then categorize vision-based control methods into indirect, semi-direct, and end-to-end approaches from the perception-control perspective. We further explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios, and underscore the challenges and innovations characterizing each area. Finally, we explore open questions and potential solutions, paving the way for ongoing research and development in this dynamic and rapidly evolving field. With growing large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising but challenging road towards artificial general intelligence (AGI) in 3D physical world.
翻译:作为先进的赛博物理系统,无人机正经历着基于视觉的学习这一领域的变革性转变。该领域因对无人机自主性和功能性产生深远影响而迅速占据重要地位。不同于现有针对特定任务的综述,本综述全面概述了无人机中的基于视觉的学习,强调其在增强无人机操作能力方面的关键作用。我们首先阐明基于视觉的学习的基本原理,突出其如何显著提升无人机的视觉感知与决策过程。随后,从感知-控制视角将基于视觉的控制方法分为间接、半直接和端到端三类方法。我们进一步探讨了具有学习能力的视觉驱动无人机的多样化应用场景,涵盖从单智能体系统到更复杂的多智能体与异构系统场景,并重点阐述了各领域面临的挑战与创新。最后,我们探索了开放性问题与潜在解决方案,为该动态且快速演进领域的研究与发展铺平道路。随着大型语言模型(LLMs)与具身智能的兴起,基于视觉的无人机学习为在三维物理世界中实现通用人工智能(AGI)提供了充满前景但充满挑战的发展路径。