In this note, we explore inference-time alignment through in-context learning. We consider a vanilla pretrained language model Llama-2 before any fine-tuning and retrieve an average of 9 demonstration alignment examples when the model is prompted to follow chat-style instructions. Compared to direct prompting, the in-context alignment without changing model weights leads to a 7x increase in win-rate w.r.t. the text-davinci-003 model from OpenAI, making the vanilla language model comparable to strong baselines with alignment fine-tuning.
翻译:本文探讨了通过上下文学习实现推理时对齐的方法。我们选取了未经任何微调的原始预训练语言模型Llama-2,在模型被要求遵循对话式指令时,平均检索了9个示范对齐示例。与直接提示相比,无需改变模型权重的上下文对齐在胜率上相对于OpenAI的text-davinci-003模型提高了7倍,使得原始语言模型能够与经过对齐微调的强基线模型相匹敌。