A reliable executable environment is the foundation for ensuring that large language models solve software engineering tasks. Due to the complex and tedious construction process, large-scale configuration is relatively inefficient. However, most methods always overlook fine-grained analysis of the actions performed by the agent, making it difficult to handle complex errors and resulting in configuration failures. To address this bottleneck, we propose EvoConfig, an efficient environment configuration framework that optimizes multi-agent collaboration to build correct runtime environments. EvoConfig features an expert diagnosis module for fine-grained post-execution analysis, and a self-evolving mechanism that lets expert agents self-feedback and dynamically adjust error-fixing priorities in real time. Empirically, EvoConfig matches the previous state-of-the-art Repo2Run on Repo2Run's 420 repositories, while delivering clear gains on harder cases: on the more challenging Envbench, EvoConfig achieves a 78.1% success rate, outperforming Repo2Run by 7.1%. Beyond end-to-end success, EvoConfig also demonstrates stronger debugging competence, achieving higher accuracy in error identification and producing more effective repair recommendations than existing methods.
翻译:可靠的可执行环境是确保大型语言模型解决软件工程任务的基础。由于构建过程复杂繁琐,大规模环境配置的效率相对较低。然而,现有方法往往忽视对智能体执行动作的细粒度分析,导致难以处理复杂错误并引发配置失败。为突破此瓶颈,本文提出EvoConfig——一种通过优化多智能体协作来构建正确运行时环境的高效配置框架。EvoConfig具备两大核心机制:用于细粒度执行后分析的专家诊断模块,以及允许专家智能体进行自我反馈并实时动态调整错误修复优先级的自进化机制。实验表明,在Repo2Run基准的420个代码库上,EvoConfig与先前最优方法Repo2Run性能持平;而在更具挑战性的EnvBench数据集上,EvoConfig以78.1%的成功率显著超越Repo2Run达7.1%。除端到端成功率外,EvoConfig还展现出更强大的调试能力,在错误识别准确率与修复建议有效性方面均优于现有方法。