We present SynCode a novel framework for efficient and general syntactical decoding of code with large language models (LLMs). SynCode leverages the grammar of a programming language, utilizing an offline-constructed efficient lookup table called DFA mask store based on language grammar terminals. We demonstrate SynCode's soundness and completeness given the context-free grammar (CFG) of the programming language, presenting its ability to retain syntactically valid tokens while rejecting invalid ones. The framework seamlessly integrates with any language defined by CFG, as evidenced by experiments on CFGs for Python and Go. The results underscore the significant reduction of 96.07% of syntax errors achieved when SynCode is combined with state-of-the-art LLMs, showcasing its substantial impact on enhancing syntactical precision in code generation. Our code is available at https://github.com/uiuc-focal-lab/syncode.
翻译:我们提出SynCode——一种面向大型语言模型(LLMs)的高效通用语法解码框架。SynCode利用编程语言的语法规则,通过离线构建基于语言语法终结符的高效查找表(即DFA掩码存储表)来实现解码。我们证明了在给定编程语言上下文无关文法(CFG)的前提下,SynCode具有可靠性与完备性,能够保留语法有效标记同时拒绝无效标记。该框架可无缝集成任何由CFG定义的语言,实验基于Python和Go语言的CFG验证了其通用性。结果表明,将SynCode与最先进的LLMs结合使用时,语法错误率显著降低96.07%,充分彰显了其在提升代码生成语法精确性方面的重大价值。我们的代码已开源:https://github.com/uiuc-focal-lab/syncode