Automated software environment setup is a prerequisite for testing, debugging, and reproducing failures, yet remains challenging in practice due to complex dependencies, heterogeneous build systems, and incomplete documentation. Recent work leverages large language models to automate this process, but typically evaluates success using weak signals such as dependency installation or partial test execution, which do not ensure that a project can actually run. In this paper, we argue that environment setup success should be evaluated through executable evidence rather than a single binary signal. We introduce the Environment Maturity Hierarchy, which defines three success levels based on progressively stronger execution requirements, culminating in successful execution of a project's main entry point. Guided by this hierarchy, we propose HerAgent, an automated environment setup approach that incrementally constructs executable environments through execution-based validation and repair. We evaluate HerAgent on four public benchmarks, where it outperforms all related work, achieving up to 79.6\% improvement due to its holistic understanding of project structure and dependencies. On complex C/C++ projects, HerAgent surpasses prior approaches by 66.7\%. In addition, HerAgent uniquely resolves 11-30 environment instances across the benchmarks that no prior method can configure.
翻译:自动化软件环境配置是测试、调试及复现故障的先决条件,但由于复杂的依赖关系、异构的构建系统以及不完整的文档,该任务在实践中仍具挑战性。近期研究利用大语言模型实现该过程的自动化,但通常采用依赖项安装或部分测试执行等弱信号来评估成功率,这些指标无法确保项目实际可运行。本文主张环境配置的成功应通过可执行证据而非单一二元信号进行评估。我们提出环境成熟度层级,该框架基于逐步增强的执行要求定义了三个成功等级,最终以项目主入口点的成功执行为最高标准。在此层级框架指导下,我们提出HerAgent——一种通过基于执行的验证与修复机制逐步构建可执行环境的自动化环境配置方法。我们在四个公开基准测试中评估HerAgent,其表现优于所有相关工作,得益于对项目结构与依赖关系的整体理解,实现了最高79.6%的性能提升。在复杂C/C++项目上,HerAgent以66.7%的优势超越现有方法。此外,HerAgent在各基准测试中独立解决了11-30个其他方法均无法配置的环境实例。