Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.
翻译:语言模型在代码生成方面展现出卓越的能力;然而,确保类型正确性仍然是一个挑战。尽管传统方法(如约束解码)通过外部拒绝不可类型化的代码来缓解此问题,但模型本身并未在内部有效学习类型推理,这最终限制了其整体性能。本文提出TyFlow,一种新颖的系统,将类型推理内化于代码生成过程中,以引导模型学习类型系统。我们方法的核心是一个创新的类型引导程序合成系统,该系统维持类型推导树与合成推导树之间的同构关系,从而实现了基于合成决策序列而非传统基于文本的标记序列的新代码表示。通过将类型系统学习的复杂性转移到表示本身,模型能够将其计算资源重新导向更高层次的程序语义。我们的评估表明,TyFlow不仅消除了类型错误,还显著提升了功能正确性,突显了在内部将语言模型与类型系统对齐的重要性。