Diffusion language models offer a compelling alternative to autoregressive code generation, enabling global planning and iterative refinement of complex program logic. However, existing approaches fail to respect the rigid structure of programming languages and, as a result, often produce broken programs that fail to execute. To address this, we introduce AnchorTree, a framework that explicitly anchors the diffusion process using structured, hierarchical priors native to code. Specifically, AnchorTree uses the abstract syntax tree to prioritize resolving syntactically and semantically salient tokens, such as keywords (e.g., if, while) and identifiers (e.g., variable names), thereby establishing a structural scaffold that guides the remaining generation. We validate this framework via AnCoder, a family of models showing that structurally anchored diffusion offers a parameter-efficient path to high-quality code generation.
翻译:扩散语言模型为自回归代码生成提供了一种引人注目的替代方案,能够对复杂程序逻辑进行全局规划和迭代优化。然而,现有方法未能遵循编程语言的严格结构,因此常常生成无法执行的错误程序。为解决此问题,我们提出了AnchorTree框架,该框架利用代码固有的结构化层次先验,显式地锚定扩散过程。具体而言,AnchorTree利用抽象语法树优先解析语法和语义上重要的标记,例如关键字(如if、while)和标识符(如变量名),从而建立引导后续生成的结构化支架。我们通过AnCoder模型系列验证了该框架,结果表明结构锚定的扩散为高质量代码生成提供了一条参数高效的路径。