Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. Specifically, in grammar-constrained decoding (GCD), the LLM's output must follow a given grammar. In this paper, we demonstrate that GCD techniques (and in general constrained decoding techniques) can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM, and so ultimately are low-quality. We call the problem of aligning sampling with a grammar constraint, grammar-aligned decoding (GAD), and propose adaptive sampling with approximate expected futures (ASAp), a decoding algorithm that guarantees the output to be grammatical while provably producing outputs that match the conditional probability of the LLM's distribution conditioned on the given grammar constraint. Our algorithm uses prior sample outputs to soundly overapproximate the future grammaticality of different output prefixes. Our evaluation on code generation and structured NLP tasks shows how ASAp often produces outputs with higher likelihood (according to the LLM's distribution) than existing GCD techniques, while still enforcing the desired grammatical constraints.
翻译:大型语言模型(LLM)在可靠生成高度结构化输出(如程序代码、数学公式或格式良好的标记语言)方面存在困难。约束解码方法通过贪心地限制LLM每一步可输出的标记来缓解此问题,从而保证输出符合给定约束。具体而言,在语法约束解码(GCD)中,LLM的输出必须遵循给定语法。本文证明,GCD技术(以及广义的约束解码技术)可能扭曲LLM的概率分布,导致生成符合语法但出现似然度与LLM原始分布不成比例的输出,最终产生低质量结果。我们将这种使采样与语法约束对齐的问题称为语法对齐解码(GAD),并提出基于近似期望未来的自适应采样(ASAp)——一种保证输出符合语法约束,同时可证明生成与LLM在给定语法约束下的条件概率分布相匹配的解码算法。该算法利用先验样本输出,对不输出前缀的未来语法符合性进行严格上近似。我们在代码生成和结构化NLP任务上的评估表明,ASAp在保持期望语法约束的同时,通常能比现有GCD技术生成具有更高似然度(基于LLM分布)的输出。