AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

翻译：无人机（UAV）在巡检、搜救、环境监测和应急响应中的应用日益广泛。然而，当前大多数无人机应用仍依赖预定义指令序列或特定任务的流水线，开发者需要手动连接感知、规划、飞行控制、仿真、日志记录和安全模块。这限制了自主空中系统的灵活性、可复现性和可扩展性。本文提出AerialClaw——一个开源软件框架，使无人机能够作为具备决策能力的空中智能体运行，而不仅仅是执行指令的平台。给定自然语言任务后，AerialClaw允许基于大语言模型（LLM）的智能体理解任务、维持上下文、调用可执行的空中技能、观察感知与运行时反馈，并以闭环方式迭代更新决策。该框架采用模块化的"大脑-技能-运行时"架构，结合用于原子无人机操作的硬技能、基于Markdown的可复用任务策略软技能、文档驱动的智能体状态与能力边界、记忆驱动的反思机制、面向安全的运行时验证以及平台无关的执行适配器。AerialClaw支持轻量级模拟执行、基于PX4 SITL与Gazebo的仿真以及AirSim仿真，同时配备Web控制台、可插拔模型后端、示例任务、仿真资产和分阶段部署脚本。通过整合标准化的空中技能、文档驱动的智能体状态与记忆、以及闭环LLM决策，AerialClaw为构建能够理解任务、做出决策、执行技能并根据反馈调整行为的无人机系统，提供了一个可复现且可扩展的开源框架。