Recently, Large Language Models (LLMs) have showcased their potential in various natural language processing tasks, including code generation. However, while significant progress has been made in adapting LLMs to generate code for several imperative programming languages and tasks, there remains a notable gap in their application to declarative formalisms, such as Answer Set Programming (ASP). In this paper, we move a step towards exploring the capabilities of LLMs for ASP code generation. First, we perform a systematic evaluation of several state-of-the-art LLMs. Despite their power in terms of number of parameters, training data and computational resources, empirical results demonstrate inadequate performances in generating correct ASP programs. Therefore, we propose LLASP, a fine-tuned lightweight model specifically trained to encode fundamental ASP program patterns. To this aim, we create an ad-hoc dataset covering a wide variety of fundamental problem specifications that can be encoded in ASP. Our experiments demonstrate that the quality of ASP programs generated by LLASP is remarkable. This holds true not only when compared to the non-fine-tuned counterpart but also when compared to the majority of eager LLM candidates, particularly from a semantic perspective. All the code and data used to perform the experiments are publicly available at https://anonymous.4open.science/r/LLASP-D86C/.
翻译:近年来,大型语言模型(LLMs)在各类自然语言处理任务中展现出巨大潜力,包括代码生成。然而,尽管在使LLMs适应多种命令式编程语言和任务的代码生成方面已取得显著进展,但将其应用于声明式形式化方法(如答案集编程(ASP))仍存在明显空白。本文在探索LLMs用于ASP代码生成的能力方面迈出了一步。首先,我们对若干先进LLMs进行了系统性评估。尽管这些模型在参数量、训练数据和计算资源方面具有强大能力,但实证结果表明其在生成正确ASP程序方面表现不足。因此,我们提出了LLASP——一个经过微调的轻量级模型,专门训练用于编码基础ASP程序模式。为此,我们创建了一个专用数据集,涵盖了大量可编码为ASP的基础问题规范。实验证明,LLASP生成的ASP程序质量显著提升。这不仅体现在与未经微调的基准模型相比具有优势,在与多数主流LLM候选模型对比时亦表现突出,尤其从语义角度而言。实验所用的全部代码与数据已公开于https://anonymous.4open.science/r/LLASP-D86C/。