Large language models are typically fine-tuned to align with human preferences, but tuning large models is computationally intensive and complex. In this work, we introduce $\textit{Integrated Value Guidance}$ (IVG), a method that uses implicit and explicit value functions to guide language model decoding at token and chunk-level respectively, efficiently aligning large language models purely at inference time. This approach circumvents the complexities of direct fine-tuning and outperforms traditional methods. Empirically, we demonstrate the versatility of IVG across various tasks. In controlled sentiment generation and summarization tasks, our method significantly improves the alignment of large models using inference-time guidance from $\texttt{gpt2}$-based value functions. Moreover, in a more challenging instruction-following benchmark AlpacaEval 2.0, we show that both specifically tuned and off-the-shelf value functions greatly improve the length-controlled win rates of large models against $\texttt{gpt-4-turbo}$ (e.g., $19.51\% \rightarrow 26.51\%$ for $\texttt{Mistral-7B-Instruct-v0.2}$ and $25.58\% \rightarrow 33.75\%$ for $\texttt{Mixtral-8x7B-Instruct-v0.1}$ with Tulu guidance).
翻译:大型语言模型通常通过微调来与人类偏好对齐,但微调大型模型计算密集且复杂。本文提出$\textit{集成价值引导}$(IVG)方法,该方法分别利用隐式和显式价值函数在词元级和语块级引导语言模型解码,从而在纯推理阶段高效实现大型语言模型的对齐。此方法规避了直接微调的复杂性,且性能优于传统方法。实验证明IVG在不同任务中均具通用性:在情感控制生成与摘要任务中,基于$\texttt{gpt2}$价值函数的推理时引导显著提升了大模型的对齐效果;在更具挑战性的指令跟随基准AlpacaEval 2.0中,专用调优与现成价值函数均大幅提升了大模型相对$\texttt{gpt-4-turbo}$的长度控制胜率(例如$\texttt{Mistral-7B-Instruct-v0.2}$从$19.51\%$提升至$26.51\%$,$\texttt{Mixtral-8x7B-Instruct-v0.1}$在Tulu引导下从$25.58\%$提升至$33.75\%$)。