Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning

Fill-in-the-Middle (FIM) has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm, which reorders original training sequences and then performs regular next-token prediction (NTP), often leads to models struggling to generate content that aligns smoothly with the surrounding context. Crucially, while existing works rely on rule-based post-processing to circumvent this weakness, such methods are not practically usable in open-domain code completion tasks as they depend on restrictive, dataset-specific assumptions (e.g., generating the same number of lines as in the ground truth). Moreover, model performance on FIM tasks deteriorates significantly without these unrealistic assumptions. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens (i.e., horizon length) at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different models and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level, and without resorting to unrealistic post-processing methods. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP only incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.

翻译：中间填充（FIM）已成为代码语言模型的核心功能，使其能够根据左右上下文生成缺失的代码片段。然而，当前通过重排原始训练序列再进行常规下一词元预测（NTP）的FIM训练范式，常导致模型难以生成与上下文自然衔接的内容。关键问题在于，现有研究依赖基于规则的后处理来规避这一缺陷，但此类方法在开放域代码补全任务中并不实用，因其依赖于受限且数据集特定的假设（例如要求生成与真实标注完全相同的行数）。若脱离这些不切实际的假设，模型在FIM任务上的性能会显著下降。我们假设，仅靠NTP不足以让模型学习基于远端右上下文的有效规划能力——这是成功实现代码填充的关键因素。为此，我们提出前瞻长度预测（HLP），这是一种创新的训练目标，旨在指导模型在每一步预测剩余中间词元的数量（即前瞻长度）。HLP通过前瞻规划机制推进FIM技术，使模型能够从本质上学习针对任意左右上下文的填充边界，而无需依赖数据集特定的后处理。我们在不同模型和规模上的评估表明，HLP在多样化基准测试（涵盖文件级和仓库级）中将FIM性能相对提升最高达24%，且无需借助不切实际的后处理方法。此外，通过HLP获得的增强规划能力还提升了模型在代码推理任务上的表现。值得注意的是，HLP仅带来可忽略的训练开销且不增加额外推理成本，确保了其在真实场景中的实用性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日