迈向工程化多智能体大语言模型：一种协议驱动的方法 (Towards Engineering Multi-Agent LLMs: A Protocol-Driven Approach)

The increasing demand for software development has driven interest in automating software engineering (SE) tasks using Large Language Models (LLMs). Recent efforts extend LLMs into multi-agent systems (MAS) that emulate collaborative development workflows, but these systems often fail due to three core deficiencies: under-specification, coordination misalignment, and inappropriate verification, arising from the absence of foundational SE structuring principles. This paper introduces Software Engineering Multi-Agent Protocol (SEMAP), a protocol-layer methodology that instantiates three core SE design principles for multi-agent LLMs: (1) explicit behavioral contract modeling, (2) structured messaging, and (3) lifecycle-guided execution with verification, and is implemented atop Google's Agent-to-Agent (A2A) infrastructure. Empirical evaluation using the Multi-Agent System Failure Taxonomy (MAST) framework demonstrates that SEMAP effectively reduces failures across different SE tasks. In code development, it achieves up to a 69.6% reduction in total failures for function-level development and 56.7% for deployment-level development. For vulnerability detection, SEMAP reduces failure counts by up to 47.4% on Python tasks and 28.2% on C/C++ tasks.

翻译：日益增长的软件开发需求推动了利用大语言模型（LLMs）自动化软件工程（SE）任务的兴趣。近期研究将LLMs扩展至模拟协作开发工作流的多智能体系统（MAS），但这些系统常因三个核心缺陷而失败：规范不足、协调失准和验证不当，这些缺陷源于缺乏基础的软件工程结构化原则。本文提出软件工程多智能体协议（SEMAP），这是一种协议层方法，为多智能体LLMs实例化了三项核心软件工程设计原则：（1）显式的行为契约建模，（2）结构化消息传递，以及（3）带验证的生命周期引导执行，并在谷歌的Agent-to-Agent（A2A）基础设施上实现。使用多智能体系统故障分类法（MAST）框架进行的实证评估表明，SEMAP能有效减少不同软件工程任务中的故障。在代码开发中，它在函数级开发上实现了高达69.6%的总故障减少，在部署级开发上实现了56.7%的减少。对于漏洞检测，SEMAP在Python任务上最多减少了47.4%的故障数量，在C/C++任务上减少了28.2%。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日