The increasing demand for software development has driven interest in automating software engineering (SE) tasks using Large Language Models (LLMs). Recent efforts extend LLMs into multi-agent systems (MAS) that emulate collaborative development workflows, but these systems often fail due to three core deficiencies: under-specification, coordination misalignment, and inappropriate verification, arising from the absence of foundational SE structuring principles. This paper introduces Software Engineering Multi-Agent Protocol (SEMAP), a protocol-layer methodology that instantiates three core SE design principles for multi-agent LLMs: (1) explicit behavioral contract modeling, (2) structured messaging, and (3) lifecycle-guided execution with verification, and is implemented atop Google's Agent-to-Agent (A2A) infrastructure. Empirical evaluation using the Multi-Agent System Failure Taxonomy (MAST) framework demonstrates that SEMAP effectively reduces failures across different SE tasks. In code development, it achieves up to a 69.6% reduction in total failures for function-level development and 56.7% for deployment-level development. For vulnerability detection, SEMAP reduces failure counts by up to 47.4% on Python tasks and 28.2% on C/C++ tasks.
翻译:日益增长的软件开发需求推动了利用大语言模型(LLMs)自动化软件工程(SE)任务的兴趣。近期研究将LLMs扩展至模拟协作开发工作流的多智能体系统(MAS),但这些系统常因三个核心缺陷而失败:规范不足、协调失准和验证不当,这些缺陷源于缺乏基础的软件工程结构化原则。本文提出软件工程多智能体协议(SEMAP),这是一种协议层方法,为多智能体LLMs实例化了三项核心软件工程设计原则:(1)显式的行为契约建模,(2)结构化消息传递,以及(3)带验证的生命周期引导执行,并在谷歌的Agent-to-Agent(A2A)基础设施上实现。使用多智能体系统故障分类法(MAST)框架进行的实证评估表明,SEMAP能有效减少不同软件工程任务中的故障。在代码开发中,它在函数级开发上实现了高达69.6%的总故障减少,在部署级开发上实现了56.7%的减少。对于漏洞检测,SEMAP在Python任务上最多减少了47.4%的故障数量,在C/C++任务上减少了28.2%。