From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents

AI tools for programming are no longer just autocomplete or chat assistants: they organize themselves as development frameworks, with process, roles, artifacts and verification. Recent surveys map agents and LLMs for software engineering, but a study centered on the operational frameworks that turn these capabilities into process is missing. We ran a directed search of primary sources, with a functional inclusion criterion and traction measurement, and selected six frameworks: GitHub Spec Kit, OpenSpec, BMAD Method, Get Shit Done (GSD), Spec Kitty and Reversa. Each attacks AI development through a different path: spec-driven development in full and lightweight variants, agent-driven agile planning, context engineering over the agent, worktree isolation and review, and recovery of operational specifications from legacy systems. Our central contribution is a six-dimension process taxonomy: specification, context, roles, execution, validation and portability, with a scoring rubric that turns it into a replicable instrument. We apply it to the six frameworks and an out-of-sample case, Spec-Flow. Two results stand out. Among frameworks that already adopt some process there is convergence: the isolated prompt loses centrality, and persistent artifacts, work contracts, traceability and human review become mechanisms that reduce ambiguity and coordinate agents. And no framework strongly covers all six dimensions, exposing a structural trade-off between process depth and portability across agents. We also found recurring risks: drift between specification and code, excessive trust in generated artifacts, fragility of community extensions, platform dependence and a lack of benchmarks for the complete process. We close with a research agenda for empirical evaluation, focused on intermediate-quality metrics, context governance, installation security and reproducibility.

翻译：人工智能编程工具已不再是单纯的自动补全或聊天助手：它们已组织为开发框架，具备流程、角色、工件和验证机制。近期研究对软件工程中的智能体和大型语言模型进行了综述，但缺乏聚焦于将这些能力转化为流程的操作性框架的研究。我们通过定向搜索原始文献，采用功能性纳入标准和影响力度量，最终选定六个框架：GitHub Spec Kit、OpenSpec、BMAD Method、Get Shit Done (GSD)、Spec Kitty 和 Reversa。每个框架通过不同路径解决AI开发问题：完整版和轻量版的规范驱动开发、智能体驱动的敏捷规划、面向智能体的上下文工程、工作树隔离与评审，以及从遗留系统恢复操作规范。我们的核心贡献在于提出一个六维度流程分类体系：规范、上下文、角色、执行、验证与可移植性，并配套评分准则使其成为可复现的工具。我们将该体系应用于六个框架及一个样本外案例——Spec-Flow。研究凸显两个结论：在已采用流程的框架间存在趋同现象——孤立提示词失去核心地位，持久化工件、工作契约、可追溯性和人工评审成为降低歧义与协调智能体的关键机制；同时，尚无框架能全面覆盖所有六个维度，暴露出流程深度与跨智能体可移植性之间的结构性权衡。我们还发现常见风险：规范与代码之间的偏离、对生成工件的过度信任、社区扩展的脆弱性、平台依赖以及缺少完整流程的基准测试。最后，我们提出聚焦中间质量指标、上下文治理、安装安全性与可重现性的实证评估研究议程。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【综述】智能体AI如何重塑软件开发生命周期：从代码补全到人类监督下的委托执行

专知会员服务

16+阅读 · 5月2日

构建面向终端的 AI 编程智能体：脚手架、测试环境、上下文工程及实践经验

专知会员服务

26+阅读 · 3月8日

通用智能体评估的逻辑架构

专知会员服务

22+阅读 · 2月28日

AI 智能体系统：体系架构、应用场景及评估范式

专知会员服务

70+阅读 · 1月6日