This paper addresses the problem of autonomous UAV search missions, where a UAV must locate specific Entities of Interest (EOIs) within a time limit, based on brief descriptions in large, hazard-prone environments with keep-out zones. The UAV must perceive, reason, and make decisions with limited and uncertain information. We propose NEUSIS, a compositional neuro-symbolic system designed for interpretable UAV search and navigation in realistic scenarios. NEUSIS integrates neuro-symbolic visual perception, reasoning, and grounding (GRiD) to process raw sensory inputs, maintains a probabilistic world model for environment representation, and uses a hierarchical planning component (SNaC) for efficient path planning. Experimental results from simulated urban search missions using AirSim and Unreal Engine show that NEUSIS outperforms a state-of-the-art (SOTA) vision-language model and a SOTA search planning model in success rate, search efficiency, and 3D localization. These results demonstrate the effectiveness of our compositional neuro-symbolic approach in handling complex, real-world scenarios, making it a promising solution for autonomous UAV systems in search missions.
翻译:本文研究了自主无人机搜索任务问题,即无人机必须在时限内,于存在禁入区的大型、危险环境中,根据简短描述定位特定目标实体。无人机需在信息有限且不确定的条件下进行感知、推理与决策。我们提出了NEUSIS,一种为现实场景中可解释的无人机搜索与导航而设计的组合式神经符号系统。NEUSIS集成了神经符号视觉感知、推理与落地模块,以处理原始感官输入;维护概率世界模型以表征环境;并采用分层规划组件进行高效路径规划。基于AirSim与Unreal Engine的模拟城市搜索任务实验结果表明,NEUSIS在成功率、搜索效率及三维定位精度上均优于最先进的视觉语言模型与最先进的搜索规划模型。这些结果证明了我们组合式神经符号方法在处理复杂现实场景方面的有效性,使其成为自主无人机搜索任务系统中一种有前景的解决方案。