Multi-agent systems (MAS) have substantially advanced autonomous software engineering (SWE), but their growing inference energy demands raise sustainability concerns. In this paper, we demonstrate that this cost is concentrated in an overlooked source: redundant output tokens generated across agents. Two empirical findings ground this claim. First, our per-token energy attribution for MAS reveals a sharp asymmetry: an output token consumes 30 to 1,000 times more energy than an input or cached token. Second, MAS inflate per-episode output because agents repeatedly re-explore overlapping repository regions. To address this inefficiency, we propose Librarian, a persistent search sub-agent that tracks repository-search history and suppresses redundant exploration actions across agents. By returning short references to file regions instead of full file excerpts, Librarian further reduces output-token volume. On SWE-Bench Verified, Librarian reduces per-episode GPU energy consumption of existing multi-agent SWE systems by up to 25% while preserving task performance.
翻译:多智能体系统(MAS)已显著推进自主软件工程(SWE)的发展,但其日益增长的推理能耗引发了可持续性担忧。本文表明,这种开销集中于一个被忽视的来源:智能体间生成的冗余输出令牌。两个实证发现支撑了这一论断。首先,我们对MAS的逐令牌能耗归因揭示了显著的非对称性:一个输出令牌消耗的能量是输入或缓存令牌的30至1000倍。其次,MAS会加剧单次执行周期的输出量,因为智能体反复重新探索重叠的代码库区域。为解决这一低效问题,我们提出Librarian——一个持久化搜索子智能体,用于追踪代码库搜索历史并抑制智能体间冗余的探索行为。通过返回文件区域的简短引用而非完整文件摘录,Librarian进一步降低了输出令牌量。在SWE-Bench Verified基准测试中,Librarian在保持任务性能的同时,将现有多智能体SWE系统的单次执行周期GPU能耗降低高达25%。