Experience-Driven Multi-Agent Systems Are Training-free Context-aware Earth Observers

Recent advances have enabled large language model (LLM) agents to solve complex tasks by orchestrating external tools. However, these agents often struggle in specialized, tool-intensive domains that demand long-horizon execution, tight coordination across modalities, and strict adherence to implicit tool constraints. Earth Observation (EO) tasks exemplify this challenge due to the multi-modal and multi-temporal data inputs, as well as the requirements of geo-knowledge constraints (spectrum library, spatial reasoning, etc): many high-level plans can be derailed by subtle execution errors that propagate through a pipeline and invalidate final results. A core difficulty is that existing agents lack a mechanism to learn fine-grained, tool-level expertise from interaction. Without such expertise, they cannot reliably configure tool parameters or recover from mid-execution failures, limiting their effectiveness in complex EO workflows. To address this, we introduce \textbf{GeoEvolver}, a self-evolving multi-agent system~(MAS) that enables LLM agents to acquire EO expertise through structured interaction without any parameter updates. GeoEvolver decomposes each query into independent sub-goals via a retrieval-augmented multi-agent orchestrator, then explores diverse tool-parameter configurations at the sub-goal level. Successful patterns and root-cause attribution from failures are then distilled in an evolving memory bank that provides in-context demonstrations for future queries. Experiments on three tool-integrated EO benchmarks show that GeoEvolver consistently improves end-to-end task success, with an average gain of 12\% across multiple LLM backbones, demonstrating that EO expertise can emerge progressively from efficient, fine-grained interactions with the environment.

翻译：近期进展使得大型语言模型（LLM）智能体能够通过协调外部工具来解决复杂任务。然而，这些智能体在专业化、工具密集型的领域中常常面临困难，这些领域需要长程执行、跨模态的紧密协调以及对隐含工具约束的严格遵循。地球观测任务正是这类挑战的典型代表，因其涉及多模态、多时相的数据输入以及地理知识约束（光谱库、空间推理等）的要求：许多高层规划可能因细微的执行错误而偏离正轨，这些错误在流程中传播并最终导致结果失效。核心困难在于现有智能体缺乏从交互中学习细粒度工具级专业知识的机制。缺乏此类专业知识，它们无法可靠地配置工具参数或从执行中途的故障中恢复，从而限制了其在复杂地球观测工作流中的有效性。为解决这一问题，我们提出了 \textbf{GeoEvolver}，一个自演进的多智能体系统，该系统使LLM智能体能够通过结构化交互获取地球观测专业知识，而无需任何参数更新。GeoEvolver通过检索增强的多智能体编排器将每个查询分解为独立的子目标，然后在子目标层面探索多样化的工具参数配置。成功的模式以及从失败中归因的根本原因随后被提炼到一个不断演进的记忆库中，该记忆库为未来查询提供情境内的演示。在三个集成工具的地球观测基准测试上的实验表明，GeoEvolver 持续提升了端到端任务成功率，在多个LLM骨干模型上平均增益达到12\%，这证明地球观测专业知识能够通过与环境进行高效、细粒度的交互而逐步涌现。