Large language models (LLMs) are deployed in a wide variety of user-facing applications. Typically, these deployments have some specific purpose, like answering questions grounded on documentation or acting as coding assistants, but they require general language understanding. In such deployments, LLMs should respond only to queries that align with the intended purpose and reject all other requests, such as generating poetry or answering questions about physics, a task we refer to as `scoping'. We conduct a comprehensive empirical evaluation of various methods, ranging from prompting, fine-tuning to preference learning and the recently proposed general alignment technique known as Circuit Breakers (CB). Across three families of language models and a broad variety of tasks, we show that it is possible to scope language models. We examine scoping for multiple topics, and fine-grained topics. We ablate diversity of irrelevant queries, layer different techniques, conduct adversarial evaluations and more. Among other results, we find that when diverse examples of irrelevant queries are available, simple supervised fine-tuning produces the best results, but when such diversity is low, Circuit Breakers perform quite well. One can often get the benefits of both methods by layering them in succession. We intend our study to serve as a practitioner's guide to scoping LLMs.
翻译:大型语言模型(LLMs)已被广泛应用于面向用户的各种应用中。通常,这些部署具有特定的目的,例如基于文档回答问题或充当编程助手,但它们需要具备通用的语言理解能力。在此类部署中,LLMs应仅响应符合预期目的的查询,并拒绝所有其他请求,例如生成诗歌或回答物理问题,我们将此任务称为“范围限定”。我们对多种方法进行了全面的实证评估,包括提示工程、微调、偏好学习以及最近提出的通用对齐技术——断路器(Circuit Breakers,CB)。通过对三个系列的语言模型和广泛的任务进行测试,我们证明了缩小语言模型的应用范围是可行的。我们研究了多主题和细粒度主题的范围限定,分析了无关查询的多样性,对不同技术进行了分层组合,并进行了对抗性评估等。研究结果表明,当存在多样化的无关查询示例时,简单的监督微调效果最佳;而当此类多样性较低时,断路器技术表现相当出色。通过依次分层应用这两种方法,通常可以兼得两者的优势。我们希望本研究能为实践者提供关于限定LLMs应用范围的实用指南。