Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools and tool-chains for legacy languages. Inspired by an HCI technique called natural program elicitation, we propose designing an intermediate language that LLMs ``naturally'' know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs significantly more frequently without sacrificing semantic correctness.
翻译:近期在面向代码应用的大语言模型(LLMs)研究中取得了显著进展,其在测试用例生成至自我修复等一系列具有挑战性的代码相关任务上展现出卓越的零样本流畅性与指令遵循能力。然而,模型在预训练数据中未出现的编程语言(称为极低资源编程语言,VLPLs)上难以组合出语法有效的程序,这一现象并不令人意外。VLPLs出现在诸多关键场景中,包括用于内部工具的领域特定语言以及遗留语言的工具链。受一种称为自然程序引导的人机交互技术启发,我们提出设计一种LLMs能够“自然”掌握使用方式、并可自动编译为目标VLPL的中间语言。当LLMs生成的代码超出该中间语言范畴时,我们采用编译器技术将代码修复为符合中间语言规范的程序。整体上,我们提出了**合成编程引导与编译**(SPEAC)方法,该方法使LLMs能够为VLPLs生成语法有效的代码。我们通过案例研究对SPEAC的性能进行实证评估,发现与现有的检索及微调基线方法相比,SPEAC在保持语义正确性的同时,显著提高了生成语法正确程序的频率。