Aligning large language models (LLMs) to diverse human preferences is fundamentally challenging since criteria can often conflict with each other. Inference-time alignment methods have recently gained popularity as they allow LLMs to be aligned to multiple criteria via different alignment algorithms at inference time. However, inference-time alignment is computationally expensive since it often requires multiple forward passes of the base model. In this work, we propose inference-aware meta-alignment (IAMA), a novel approach that enables LLMs to be aligned to multiple criteria with limited computational budget at inference time. IAMA trains a base model such that it can be effectively aligned to multiple tasks via different inference-time alignment algorithms. To solve the non-linear optimization problems involved in IAMA, we propose non-linear GRPO, which provably converges to the optimal solution in the space of probability measures.
翻译:将大语言模型(LLMs)与多样化的人类偏好对齐具有根本性挑战,因为对齐标准常常相互冲突。推理时对齐方法近期受到关注,因其允许在推理时通过不同的对齐算法将LLMs与多种标准对齐。然而,推理时对齐计算成本高昂,通常需要对基础模型进行多次前向传播。本研究提出推理感知元对齐(IAMA),这是一种新颖方法,能够在有限推理计算预算下使LLMs与多种标准对齐。IAMA通过训练基础模型,使其能够通过不同的推理时对齐算法有效适应多种任务。为解决IAMA中涉及的非线性优化问题,我们提出非线性GRPO算法,该算法在概率测度空间内可证明收敛至最优解。