Large Language Models (LLMs) are being employed widely to automate tasks across the software development life-cycle. It is, however, unclear whether these tasks are performed consistently with respect to the semantics of the artefacts being handled. This question is particularly under-researched concerning architectural design specification. In this paper, we address this question for High-Level Message Sequence Charts (HMSCs). These are visual models with a rigorous formal semantics that have been used for various purposes, including as a foundation for Sequence Diagrams in the Unified Modelling Language (UML). We examine whether LLMs "understand" the semantics of HMSCs by examining three LLMs (Gemini-3, GPT-5.4, and Qwen-3.6) on how they perform 129 semantic tasks ranging from querying basic semantic constructs in HMSCs (i.e., events and their ordering) to semantic-preserving abstractions and compositions, and calculating the set of traces and trace-equivalent labelled transition systems. The results show that LLMs only have a modest understanding of the formal semantics of HMSCs (ca. 52% overall accuracy), with great variability across different semantic concepts: while LLMs seem to understand the basic semantic concepts of MSCs (ca. 88% accuracy), they struggle with semantic reasoning in tasks involving abstraction and composition (ca. 36% accuracy) and traces and LTSs (ca. 42% accuracy). In particular, all three LLMs struggle with the notions of co-region and explicit causal dependencies and never employed them in semantic-preserving transformations.
翻译:大型语言模型(LLMs)正被广泛用于自动化软件开发生命周期中的各项任务。然而,这些任务的执行是否始终与被处理工件的语义保持一致,目前尚不明确。这一问题在架构设计规范领域尤为缺乏研究。本文针对高级消息序列图(HMSCs)探讨了该问题。HMSCs是具有严格形式化语义的可视化模型,已被用于多种目的,包括作为统一建模语言(UML)中顺序图的基础。我们通过考察三种LLMs(Gemini-3、GPT-5.4和Qwen-3.6)完成129项语义任务的表现,检验它们是否“理解”HMSCs的语义。这些任务涵盖从查询HMSCs中的基本语义结构(即事件及其顺序),到语义保持的抽象与组合,以及计算迹集合和迹等价标记转移系统。结果表明,LLMs对HMSCs形式化语义的理解仅属中等(总体准确率约52%),且不同语义概念间差异显著:虽然LLMs似乎理解消息序列图(MSCs)的基本语义概念(准确率约88%),但在涉及抽象与组合(准确率约36%)以及迹与标记转移系统(准确率约42%)的语义推理任务中表现不佳。特别地,所有三种LLMs均难以理解“共区域”和显式因果依赖关系,且从未在语义保持变换中使用这些概念。