Software Engineering Agents (SWE-Agents) have proven effective for traditional software engineering tasks with accessible codebases, but their performance for embodied tasks requiring well-designed information discovery remains unexplored. We present the first extended evaluation of SWE-Agents on controller generation for embodied tasks, adapting Mini-SWE-Agent (MSWEA) to solve 20 diverse embodied tasks from the Minigrid environment. Our experiments compare agent performance across different information access conditions: with and without environment source code access, and with varying capabilities for interactive exploration. We quantify how different information access levels affect SWE-Agent performance for embodied tasks and analyze the relative importance of static code analysis versus dynamic exploration for task solving. This work establishes controller generation for embodied tasks as a crucial evaluation domain for SWE-Agents and provides baseline results for future research in efficient reasoning systems.
翻译:软件工程智能体(SWE-Agents)在代码库可访问的传统软件工程任务中已证明其有效性,但其在需要精心设计信息发现的具身任务中的性能尚未得到探索。本文首次对SWE-Agents在具身任务控制器生成方面进行了系统性评估,通过适配Mini-SWE-Agent(MSWEA)来解决Minigrid环境中的20个多样化具身任务。实验比较了智能体在不同信息访问条件下的性能:包括有无环境源代码访问权限,以及具备不同交互探索能力的情况。我们量化了不同信息访问层级对SWE-Agents处理具身任务性能的影响,并分析了静态代码分析与动态探索在任务解决中的相对重要性。本研究确立了具身任务控制器生成作为SWE-Agents关键评估领域的地位,并为未来高效推理系统的研究提供了基准结果。