LLMs can learn self-restraint through iterative self-reflection

In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of "self-reflection" consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.

翻译：为实现安全部署，大型语言模型（LLMs）必须具备基于其知识水平与特定主题相关的不确定性来动态调整行为的能力。我们将这种自适应行为称为自我约束，由于该能力依赖于LLM的内部知识，其教学具有非平凡性。默认情况下，LLMs的训练目标是最大化下一词元似然，这种训练方式无法教会模型根据其不确定程度调节回答。为学习自我约束，我们设计了一种效用函数，可激励模型仅在确信答案正确时生成回复。该效用函数可用于评估不同长度生成结果及弃答行为。为优化此函数，我们提出ReSearch算法——一种由迭代自我提示与自我评估构成的“自反思”过程。我们使用ReSearch算法生成合成数据对模型进行微调。实验表明：通过学习选择性自我约束，优化后的模型在已知与未知主题上均能在不增加推理成本的前提下，整体减少\emph{幻觉}生成。此外，通过在搜索过程中将模型生成的样本与表达弃答的回复进行增强，我们的方法优雅地整合了弃答能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/