This work introduces a general code generation framework that incorporates infilling operations into auto-regressive decoding. Our approach capitalizes on the observation that recent code language models with infilling capabilities can perform \emph{self-infilling}: whereas infilling operations aim to fill in the middle based on a predefined prefix and suffix, self-infilling sequentially generates both such surrounding context and the infilled content. We utilize this feature to develop an infilling-augmented decoding process that facilitates non-monotonic generation. This approach allows for postponing the generation of uncertain code snippets until a definitive suffix is established, leading to improved control over the generation sequence. In addition, it facilitates a looping mechanism, which can iteratively update and synchronize each piece of generation in a cyclic manner. Extensive experiments are conducted to demonstrate that our proposed decoding process is effective in enhancing regularity and quality across several code generation benchmarks.
翻译:本文提出了一种通用的代码生成框架,将填充操作融入自回归解码过程。我们的方法利用了最新具有填充能力的代码语言模型可以执行"自填充"这一特性:传统填充操作基于预定义的前缀和后缀来填充中间内容,而自填充则能依次生成周围的上下文和被填充的内容。我们利用这一特性开发了增强填充的解码过程,实现了非单调生成。该方法允许推迟生成不确定的代码片段,直到确定性的后缀形成,从而增强对生成序列的控制。此外,它支持循环机制,能够以循环方式迭代更新并同步每个生成的片段。大量实验表明,我们提出的解码过程在多个代码生成基准测试中有效提升了生成结果的规律性与质量。