This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. Our approach capitalizes on the observation that recent infilling-capable code language models can self-infill: whereas infilling operations aim to fill in the middle based on a predefined prefix and suffix, self-infilling sequentially generates both such surrounding context and the infilled content. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding, evolving it into a non-monotonic process. Interruptions allow for postponing the generation of specific code until a definitive suffix is established, enhancing control over the output. Meanwhile, the looping mechanism, which leverages the complementary nature of self-infilling and left-to-right decoding, can iteratively update and synchronize each piece of generation cyclically. Extensive experiments are conducted to demonstrate that our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.
翻译:本文提出了自填充代码生成框架,这是一种将填充操作融入自回归解码的通用方法。我们的方法基于以下观察:近期具备填充能力的代码语言模型能够实现自填充——尽管填充操作旨在基于预定义的前缀和后缀完成中间内容生成,但自填充能够顺序生成此类上下文环境及待填充内容。我们利用该能力在传统解码中引入了新颖的中断与循环机制,将其转化为非单调过程。中断机制允许推迟特定代码片段的生成直至确定的后缀建立,从而增强对输出结果的控制。同时,循环机制利用自填充与从左到右解码的互补特性,能够以循环方式迭代更新并同步每个生成片段。大量实验结果表明,我们提出的解码过程在多个代码生成基准测试中能有效提升生成结果的规范性与质量。