Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools and tool-chains for legacy languages. Inspired by an HCI technique called natural program elicitation, we propose designing an intermediate language that LLMs ``naturally'' know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs significantly more frequently without sacrificing semantic correctness.
翻译:近期在代码应用领域的大型语言模型(LLM)研究取得了显著进展,其在测试用例生成至自我修复等一系列具有挑战性的代码相关任务中,展现出卓越的零样本流畅性与指令遵循能力。然而,模型在预训练数据中未出现的编程语言(称为极低资源编程语言,VLPL)上难以组合出语法正确的程序,这一现象并不令人意外。VLPL广泛存在于关键场景中,包括用于内部工具的领域特定语言及遗留语言的工具链。受一种称为自然程序引导的人机交互技术启发,我们提出设计一种LLM能够“自然”掌握使用方式、并可自动编译为目标VLPL的中间语言。当LLM生成的代码超出该中间语言范畴时,我们采用编译器技术将代码修复为符合中间语言规范的程序。整体而言,我们提出了合成编程引导与编译方法,该方法使LLM能够为VLPL生成语法有效的代码。我们通过案例研究对SPEAC的性能进行实证评估,结果表明:相较于现有的检索与微调基线方法,SPEAC在保持语义正确性的同时,显著提高了生成语法正确程序的频率。