Automated software environment setup is a prerequisite for testing, debugging, and reproducing failures, yet remains challenging in practice due to complex dependencies, heterogeneous build systems, and incomplete documentation. Recent work leverages large language models to automate this process, but typically evaluates success using weak signals such as dependency installation or partial test execution, which do not ensure that a project can actually run. In this paper, we argue that environment setup success should be evaluated through executable evidence rather than a single binary signal. We introduce the Environment Maturity Hierarchy, which defines three success levels based on progressively stronger execution requirements, culminating in successful execution of a project's main entry point. Guided by this hierarchy, we propose HerAgent, an automated environment setup approach that incrementally constructs executable environments through execution-based validation and repair. We evaluate HerAgent on four public benchmarks, where it outperforms all related work, achieving up to 79.6\% improvement due to its holistic understanding of project structure and dependencies. On complex C/C++ projects, HerAgent surpasses prior approaches by 66.7\%. In addition, HerAgent uniquely resolves 11-30 environment instances across the benchmarks that no prior method can configure.
翻译:自动化软件环境搭建是测试、调试与故障复现的前提条件,但由于依赖关系复杂、构建系统异构及文档不完整等问题,在实践中仍具挑战性。近期研究利用大语言模型实现该过程的自动化,但通常仅通过依赖项安装或部分测试执行等弱信号评估成功率,这些指标无法确保项目实际可运行。本文主张环境搭建的成功应通过可执行证据而非单一二元信号进行评估。我们提出环境成熟度分层框架,该框架基于递增强化的执行要求定义了三个成功等级,其最高标准是项目主入口点的成功执行。在此框架指导下,我们提出HerAgent——一种通过基于执行的验证与修复机制逐步构建可执行环境的自动化环境搭建方法。我们在四个公共基准测试上评估HerAgent,其表现优于所有相关研究,得益于对项目结构与依赖关系的整体理解,实现了最高79.6%的性能提升。在复杂C/C++项目中,HerAgent以66.7%的优势超越现有方法。此外,HerAgent在各基准测试中独立解决了11-30个其他方法均无法配置的环境实例。