Leveraging LLMs for Grammar Adaptation: A Study on Metamodel-Grammar Co-Evolution

In model-driven engineering, metamodel evolution leads to the need to adapt corresponding grammars to maintain consistency, which typically requires tedious manual work. Existing rule-based methods can achieve partial automation but have limitations when handling complex grammar scenarios. This paper proposes a Large Language Model-based approach that automatically applies adaptations to new grammars after evolution by learning grammar adaptations from previous versions. We evaluated this approach on six real-world Xtext domain-specific languages, using four DSLs as a training set to develop prompting strategies, two DSLs as a test set for validation, and conducting a longitudinal case study on QVTo. The evaluation used three Large Language Models (Claude Sonnet 4.5, ChatGPT 5.1, Gemini 3) and measured grammar adaptation quality from three dimensions: grammar rule-level adaptation consistency, output similarity, and metamodel conformance. Results show that on the test set, all three LLMs achieved 100% adaptation consistency and output similarity, while the rule-based approach achieved only 84.21% on DOT and 62.50% on Xcore. In the QVTo longitudinal study, the LLM-based approach successfully reused learned adaptations across all three evolution steps without manual grammar editing, while the rule-based approach required manual adjustments in two of three transitions. However, on large-scale grammars (EAST-ADL, 297 rules), LLMs' adaptation consistency was far below 90%. This study demonstrates the advantages of LLM-based approaches in handling complex grammar scenarios, while revealing their limitations in large-scale grammar adaptation.

翻译：在模型驱动工程中，元模型的演化导致需要适配相应的语法以保持一致性，这通常需要繁琐的人工操作。现有基于规则的方法可以实现部分自动化，但在处理复杂语法场景时存在局限性。本文提出一种基于大语言模型的方法，通过学习先前版本的语法适配，在演化后自动对新的语法应用适配。我们在六个真实的Xtext领域特定语言上评估了该方法，其中使用四个DSL作为训练集以制定提示策略，两个DSL作为测试集进行验证，并在QVTo上开展了纵向案例研究。评估使用了三种大语言模型（Claude Sonnet 4.5、ChatGPT 5.1、Gemini 3），并从三个维度衡量语法适配质量：语法规则级适配一致性、输出相似度以及元模型符合度。结果表明，在测试集上，所有三种大语言模型均实现了100%的适配一致性和输出相似度，而基于规则的方法在DOT上仅达到84.21%，在Xcore上达到62.50%。在QVTo纵向研究中，基于大语言模型的方法在所有三个演化步骤中成功复用了已学习的适配，无需手动编辑语法，而基于规则的方法在三个过渡中有两个需要手动调整。然而，在大规模语法（EAST-ADL，297条规则）上，大语言模型的适配一致性远低于90%。本研究展示了基于大语言模型的方法在处理复杂语法场景中的优势，同时揭示了其在大规模语法适配中的局限性。