Understanding and extracting the grammar of a domain-specific language (DSL) is crucial for various software engineering tasks; however, manually creating these grammars is time-intensive and error-prone. This paper presents Kajal, a novel approach that automatically infers grammar from DSL code snippets by leveraging Large Language Models (LLMs) through prompt engineering and few-shot learning. Kajal dynamically constructs input prompts, using contextual information to guide the LLM in generating the corresponding grammars, which are iteratively refined through a feedback-driven approach. Our experiments show that Kajal achieves 60% accuracy with few-shot learning and 45% without it, demonstrating the significant impact of few-shot learning on the tool's effectiveness. This approach offers a promising solution for automating DSL grammar extraction, and future work will explore using smaller, open-source LLMs and testing on larger datasets to further validate Kajal's performance.
翻译:理解并提取领域特定语言(DSL)的语法对于各类软件工程任务至关重要;然而,手动创建这些语法耗时且易错。本文提出Kajal,一种通过提示工程与少样本学习利用大型语言模型(LLM)从DSL代码片段自动推断语法的新方法。Kajal动态构建输入提示,利用上下文信息引导LLM生成对应语法,并通过反馈驱动的方法迭代优化。实验表明,Kajal在少样本学习下达到60%的准确率,无少样本学习时为45%,证明了少样本学习对该工具效能的重要影响。该方法为自动化DSL语法提取提供了可行方案,未来工作将探索使用更小的开源LLM并在更大数据集上测试,以进一步验证Kajal的性能。