In software engineering (SE) research and practice, UML is well known as an essential modeling methodology for requirements analysis and software modeling in both academia and industry. In particular, fundamental knowledge of UML modeling and practice in creating high-quality UML models are included in SE-relevant courses in the undergraduate programs of many universities. This leads to a time-consuming and labor-intensive task for educators to review and grade a large number of UML models created by the students. Recent advancements in generative AI techniques, such as ChatGPT, have paved new ways to automate many SE tasks. However, current research or tools seldom explore the capabilities of ChatGPT in evaluating the quality of UML models. This paper aims to investigate the feasibility and effectiveness of ChatGPT in assessing the quality of UML use case diagrams, class diagrams, and sequence diagrams. First, 11 evaluation criteria with grading details were proposed for these UML models. Next, a series of experiments were designed and conducted on 40 students' UML modeling reports to explore the performance of ChatGPT in evaluating and grading these UML diagrams. The research findings reveal that ChatGPT performed well in this assessing task because the scores that ChatGPT gives to the UML models are similar to the ones by human experts, and there are three evaluation discrepancies between ChatGPT and human experts, but varying in different evaluation criteria used in different types of UML models.
翻译:在软件工程(SE)的研究与实践中,UML作为一种重要的建模方法,在学术界和工业界的需求分析与软件建模中广为人知。特别是,UML建模的基础知识以及创建高质量UML模型的实践,被纳入许多大学本科课程中与SE相关的教学内容。这导致教育工作者在审阅和批改学生创建的大量UML模型时,面临耗时且劳动密集的任务。生成式人工智能技术(如ChatGPT)的最新进展,为自动化许多SE任务开辟了新途径。然而,当前的研究或工具很少探索ChatGPT在评估UML模型质量方面的能力。本文旨在研究ChatGPT在评估UML用例图、类图和序列图质量方面的可行性与有效性。首先,针对这些UML模型提出了包含评分细节的11项评估标准。接着,设计并开展了一系列实验,基于40份学生的UML建模报告,探究ChatGPT在评估和评分这些UML图表方面的表现。研究结果表明,ChatGPT在此项评估任务中表现良好,因为ChatGPT为UML模型给出的分数与人类专家的评分相似,且ChatGPT与人类专家之间存在三项评估差异,但这些差异因不同类型的UML模型所采用的评估标准而异。