In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of "self-reflection" consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.
翻译:为实现安全部署,大型语言模型(LLMs)必须具备基于其知识水平与特定主题相关的不确定性来动态调整行为的能力。我们将这种自适应行为称为自我约束,由于该能力依赖于LLM的内部知识,其教学具有非平凡性。默认情况下,LLMs的训练目标是最大化下一词元似然,这种训练方式无法教会模型根据其不确定程度调节回答。为学习自我约束,我们设计了一种效用函数,可激励模型仅在确信答案正确时生成回复。该效用函数可用于评估不同长度生成结果及弃答行为。为优化此函数,我们提出ReSearch算法——一种由迭代自我提示与自我评估构成的“自反思”过程。我们使用ReSearch算法生成合成数据对模型进行微调。实验表明:通过学习选择性自我约束,优化后的模型在已知与未知主题上均能在不增加推理成本的前提下,整体减少\emph{幻觉}生成。此外,通过在搜索过程中将模型生成的样本与表达弃答的回复进行增强,我们的方法优雅地整合了弃答能力。