Generative AI plays an increasing role during software engineering activities to make them, e.g., more efficient or provide better quality. However, it is often unclear how much benefit LLMs really provide. We concentrate on software architects and investigated how an LLM-supported evaluation of architecture documents can support software architects to improve such artefacts. In the context of a research project where a digital marketplace is developed and digital solutions should be analyzed, we used different LLMs to analyze the quality of architecture documents and compared the results with evaluations from software architects. We found out that the quality of the artifact has a strong influence on the quality of the LLM, i.e., the better the quality of the architecture document was, the more consistent were the LLM-based evaluation and the human expert evaluation. While using LLMs in this architecture task is promising, our results showed inconsistencies that need further analyses before generalizing them.
翻译:生成式人工智能在软件工程活动中扮演着日益重要的角色,旨在提升效率或改进质量。然而,大型语言模型(LLM)的实际效益往往难以量化。本研究聚焦于软件架构师,探讨了LLM支持的架构文档评估如何协助软件架构师改进此类制品。在一个开发数字市场并需分析数字解决方案的研究项目中,我们采用多种LLM对架构文档质量进行分析,并将其结果与软件架构师的评估进行对比。研究发现,制品的质量对LLM输出质量具有显著影响——架构文档质量越高,基于LLM的评估与人类专家评估的一致性越强。尽管在此类架构任务中应用LLM前景可观,但我们的研究结果揭示了若干不一致性,需进一步分析方可推广至普遍场景。