Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under parallel execution. Our experiments show that our approach reduces critical path length by 38-46% compared to the state-of-the-art baseline for multi-agent architecture search across multiple benchmarks, while maintaining or even improving task performance. These results highlight the importance of explicitly optimizing latency under parallel execution when designing efficient multi-agent systems. The code is available at https://github.com/xishi404/LAMaS
翻译:多智能体系统(MAS)通过协调多个智能体实现复杂推理,但由于多步执行和重复的模型调用,通常会产生较高的推理延迟,严重限制了其在时间敏感场景中的可扩展性和可用性。现有方法主要优化任务性能和推理成本,并显式或隐式地假设顺序执行,这使得它们在并行执行下控制延迟方面不够理想。在本工作中,我们研究了在并行执行下具有显式延迟监督的、基于学习的多智能体系统编排。我们提出了延迟感知多智能体系统(LAMaS),这是一个延迟感知的多智能体编排框架,支持并行执行并显式优化关键执行路径,使控制器能够在并行执行下构建具有更低延迟的执行拓扑图。我们的实验表明,在多个基准测试中,与最先进的多智能体架构搜索基线相比,我们的方法将关键路径长度减少了38-46%,同时保持甚至提升了任务性能。这些结果凸显了在设计高效多智能体系统时,显式优化并行执行下延迟的重要性。代码可在 https://github.com/xishi404/LAMaS 获取。