In-context learning (ICL) is a recent advancement in the capabilities of large language models (LLMs). This feature allows users to perform a new task without updating the model. Concretely, users can address tasks during the inference time by conditioning on a few input-label pair demonstrations along with the test input. It is different than the conventional fine-tuning paradigm and offers more flexibility. However, this capability also introduces potential issues. For example, users may use the model on any data without restriction, such as performing tasks with improper or sensitive content, which might violate the model policy or conflict with the model owner's interests. As a model owner, it is crucial to establish a mechanism to control the model's behavior under ICL, depending on the model owner's requirements for various content. To this end, we introduce the concept of "applicability authorization" tailored for LLMs, particularly for ICL behavior, and propose a simple approach, ICLGuard. It is a fine-tuning framework designed to allow the model owner to regulate ICL behavior on different data. ICLGuard preserves the original LLM and fine-tunes only a minimal set of additional trainable parameters to "guard" the LLM. Empirical results show that the guarded LLM can deactivate its ICL ability on target data without affecting its ICL ability on other data and its general functionality across all data.
翻译:上下文学习(ICL)是大型语言模型(LLM)能力的最新进展。该特性允许用户在不更新模型的情况下执行新任务。具体而言,用户可在推理阶段通过基于少量输入-标签对示例及测试输入的条件化处理来完成特定任务。这与传统的微调范式不同,提供了更高的灵活性。然而,这种能力也带来了潜在问题。例如,用户可能无限制地在任意数据上使用模型,包括处理不当或敏感内容的任务,这可能违反模型政策或与模型所有者的利益产生冲突。作为模型所有者,建立一种根据其对不同内容的要求来控制模型ICL行为的机制至关重要。为此,我们提出了专为LLM(特别是ICL行为)定制的“适用性授权”概念,并提出了一种简单方法——ICLGuard。这是一个微调框架,旨在使模型所有者能够调控模型在不同数据上的ICL行为。ICLGuard保留原始LLM参数,仅微调极少量可训练参数来“守护”LLM。实验结果表明,受守护的LLM能够在目标数据上停用其ICL能力,同时不影响其在其他数据上的ICL能力及在所有数据上的通用功能。