Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools, tool-chains for legacy languages, and formal verification frameworks. Inspired by a technique called natural programming elicitation, we propose designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness.
翻译:近期在代码应用领域的大型语言模型(LLMs)取得了显著进展,其在从测试用例生成到自我修复等一系列具有挑战性的代码相关任务中展现出卓越的零样本流畅性与指令遵循能力。然而,模型在预训练中未涵盖的编程语言(称为极低资源编程语言,VLPLs)上难以组合出语法有效的程序,这并不令人意外。VLPLs出现在关键场景中,包括用于内部工具的领域特定语言、遗留语言的工具链以及形式化验证框架。受自然编程引导技术的启发,我们提出设计一种LLMs“天然”知道如何使用、且可自动编译为目标VLPL的中间语言。当LLMs生成的代码超出该中间语言范围时,我们利用编译器技术将代码修复为中间语言程序。整体而言,我们引入了合成式编程引导与编译(SPEAC)方法,该方法使LLMs能够为VLPLs生成语法有效的代码。我们通过UCLID5形式化验证语言的案例研究对SPEAC的性能进行实证评估,发现与现有的检索和微调基线相比,SPEAC能更频繁地生成语法正确的程序,且不牺牲语义正确性。