Reinforcement learning (RL) has become a key paradigm for training software engineering (SWE) agents, but existing pipelines typically rely on per-task containers for isolation. At scale, pre-built container images incur substantial storage overhead, slow environment setup, and require container-management privileges. We propose SWE-MiniSandbox, a lightweight, container-free method that enables scalable RL training of SWE agents without sacrificing isolation. Instead of relying on per-instance containers, SWE-MiniSandbox executes each task in an isolated workspace backed by kernel-level mechanisms, substantially reducing system overhead. It leverages lightweight environment pre-caching techniques to eliminate the need for bulky container images. As a result, our approach lowers disk usage to approximately 5\% of that required by container-based pipelines and reduces environment preparation time to about 25\% of the container baseline. Empirical results demonstrate that SWE-MiniSandbox achieves evaluation performance comparable to standard container-based pipelines. By removing the dependency on heavy container infrastructure, SWE-MiniSandbox offers a practical and accessible foundation for scaling RL-based SWE agents, particularly in resource-constrained research environments.
翻译:强化学习已成为训练软件工程智能体的关键范式,但现有流程通常依赖按任务配置的容器来实现隔离。在大规模场景下,预构建的容器镜像会带来显著的存储开销、拖慢环境搭建速度,并且需要容器管理权限。我们提出了SWE-MiniSandbox,一种轻量级的无容器方法,可在不牺牲隔离性的前提下实现软件工程智能体的可扩展强化学习训练。SWE-MiniSandbox不依赖按实例配置的容器,而是通过内核级机制在每个隔离的工作空间中执行任务,从而大幅降低系统开销。该方法利用轻量级环境预缓存技术,消除了对庞大容器镜像的需求。因此,我们的方法将磁盘使用量降至基于容器流程所需的大约5%,并将环境准备时间缩短至容器基准的约25%。实证结果表明,SWE-MiniSandbox实现了与标准基于容器的流程相当的评估性能。通过摆脱对重型容器基础设施的依赖,SWE-MiniSandbox为扩展基于强化学习的软件工程智能体提供了一个实用且易于使用的基础,特别适用于资源受限的研究环境。