Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of LLMs, called sequential Monte Carlo (SMC) steering. The key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models, and replace standard decoding with sequential Monte Carlo inference. For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks, including infilling, generation under syntactic constraints, and prompt intersection. To facilitate experimentation with SMC steering, we present a probabilistic programming library, LLaMPPL (https://github.com/probcomp/LLaMPPL), for concisely specifying new generation tasks as language model probabilistic programs, and automating steering of LLaMA-family Transformers.
翻译:即便是经过微调和强化学习,仅通过提示词(prompts)来可靠地控制大型语言模型(LLMs)仍然困难,甚至不可能。我们提出了一种新的推理时方法,用于对LLM输出施加句法和语义约束,称为序列蒙特卡洛(SMC)引导。其核心思想是将语言生成任务指定为一类离散概率序列模型中的后验推断问题,并用序列蒙特卡洛推断替代标准解码。在计算成本与束搜索(beam search)相当的情况下,SMC能够引导LLM解决多种任务,包括文本填充、句法约束下的生成和提示词交集。为便于SMC引导的实验,我们开发了一个概率编程库LLaMPPL(https://github.com/probcomp/LLaMPPL),用于将新生成任务简洁地指定为语言模型概率程序,并自动化引导LLaMA系列Transformer模型。