Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.
翻译:大语言模型(LLM)驱动的多智能体系统(MAS)在解决复杂协作任务方面展现出潜力,此类系统通常通过角色特定提示来编排各智能体。尽管提示质量至关重要,但在交互智能体间联合优化提示仍是一项具有挑战性的任务,主要源于局部智能体目标与全局系统目标之间的错位。为此,我们提出MASPO这一新型框架,旨在自动迭代优化整个系统的提示。MASPO的核心创新在于其联合评估机制:该机制不仅评估提示的局部有效性,更注重其促进后继智能体下游任务成功的能力。这在不依赖真实标签的前提下,有效弥合了局部交互与全局成果之间的鸿沟。此外,MASPO采用数据驱动的进化波束搜索,高效探索高维提示空间。在6项不同任务上的大量实验表明,MASPO持续优于最先进的提示优化方法,平均准确率提升2.9%。我们已将代码开源至https://github.com/wangzx1219/MASPO。