When prompting a language model (LM), users frequently expect the model to adhere to a set of behavioral principles across diverse tasks, such as producing insightful content while avoiding harmful or biased language. Instilling such principles into a model can be resource-intensive and technically challenging, generally requiring human preference labels or examples. We introduce SAMI, a method for teaching a pretrained LM to follow behavioral principles that does not require any preference labels or demonstrations. SAMI is an iterative algorithm that finetunes a pretrained LM to increase the conditional mutual information between constitutions and self-generated responses given queries from a datasest. On single-turn dialogue and summarization, a SAMI-trained mistral-7b outperforms the initial pretrained model, with win rates between 66% and 77%. Strikingly, it also surpasses an instruction-finetuned baseline (mistral-7b-instruct) with win rates between 55% and 57% on single-turn dialogue. SAMI requires a "principle writer" model; to avoid dependence on stronger models, we further evaluate aligning a strong pretrained model (mixtral-8x7b) using constitutions written by a weak instruction-finetuned model (mistral-7b-instruct). The SAMI-trained mixtral-8x7b outperforms both the initial model and the instruction-finetuned model, achieving a 65% win rate on summarization. Our results indicate that a pretrained LM can learn to follow constitutions without using preference labels, demonstrations, or human oversight.
翻译:当用户提示语言模型时,常期望模型能在多样化任务中遵守一套行为准则,例如生成富有洞察力的内容同时避免有害或偏见的语言。将此类准则植入模型会消耗大量资源且技术难度高,通常需要人工偏好标签或示例。我们提出SAMI方法,这是一种无需任何偏好标签或示范即可教导预训练语言模型遵循行为准则的技术。SAMI是一种迭代算法,通过微调预训练语言模型,在给定数据集查询的条件下,增强宪法与模型自生成回复之间的条件互信息。在单轮对话和文本摘要任务中,经SAMI训练的mistral-7b模型性能提升显著,对抗原始预训练模型的胜率介于66%至77%之间。值得注意的是,其在单轮对话任务中甚至以55%-57%的胜率超越了指令微调基线模型(mistral-7b-instruct)。SAMI需要"原则编写器"模型;为避免依赖更强模型,我们进一步验证了使用弱指令微调模型(mistral-7b-instruct)编写的宪法来对齐强预训练模型(mixtral-8x7b)的效果。经SAMI训练的mixtral-8x7b模型在文本摘要任务中以65%胜率同时超越原始模型与指令微调模型。我们的结果表明,预训练语言模型无需偏好标签、示范或人工监督即可学会遵循宪法准则。