Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Different from existing task-specific surveys, this review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities under various scenarios. We start by elucidating the fundamental principles of vision-based learning, highlighting how it significantly improves drones' visual perception and decision-making processes. We then categorize vision-based control methods into indirect, semi-direct, and end-to-end approaches from the perception-control perspective. We further explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios, and underscore the challenges and innovations characterizing each area. Finally, we explore open questions and potential solutions, paving the way for ongoing research and development in this dynamic and rapidly evolving field. With growing large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising but challenging road towards artificial general intelligence (AGI) in 3D physical world.
翻译:无人机作为先进的赛博物理系统,正随着基于视觉的学习这一领域的出现经历变革性转变,该领域因其对无人机自主性与功能的深远影响而迅速崭露头角。与现有面向特定任务的综述不同,本文对无人机中的基于视觉学习进行了全面概述,着重强调其在各类场景下增强无人机操作能力的关键作用。我们首先阐释基于视觉学习的基本原则,阐明它如何显著提升无人机的视觉感知与决策过程。随后,我们从感知-控制视角将基于视觉的控制方法分为间接法、半直接法与端到端法三类。进一步地,我们探讨了具有学习能力的基于视觉无人机在多种应用场景中的实践——从单智能体系统延伸至更复杂的多智能体与异构系统场景,并重点突出各领域的挑战与创新。最后,我们探讨开放性问题和潜在解决方案,为该动态且快速演进领域的持续研究与发展铺平道路。随着大规模语言模型(LLMs)与具身智能的兴起,基于视觉的无人机学习为在三维物理世界中迈向通用人工智能(AGI)提供了一条充满希望却极具挑战的路径。