Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of LLMs, called sequential Monte Carlo (SMC) steering. The key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models, and replace standard decoding with sequential Monte Carlo inference. For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks, including infilling, generation under syntactic constraints, and prompt intersection. To facilitate experimentation with SMC steering, we present a probabilistic programming library, LLaMPPL (https://github.com/probcomp/hfppl), for concisely specifying new generation tasks as language model probabilistic programs, and automating steering of LLaMA-family Transformers.
翻译:即使在微调和强化学习之后,大语言模型仍难以(甚至不可能)仅通过提示实现可靠控制。我们提出一种新的推理时方法,称为序贯蒙特卡洛操控,用于对大语言模型输出施加句法和语义约束。其核心思想是将语言生成任务表述为一类离散概率序列模型中的后验推理问题,并用序贯蒙特卡洛推理替代标准解码。在计算成本与束搜索相当的情况下,序贯蒙特卡洛操控可引导大语言模型解决多种任务,包括文本填充、语法约束生成和提示交集。为便于实验使用序贯蒙特卡洛操控,我们提供了概率编程库LLaMPPL(https://github.com/probcomp/hfppl),用于将新生成任务简洁地定义为语言模型概率程序,并自动操控LLaMA系列Transformer模型。