ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization

In-context learning (ICL) is a recent advancement in the capabilities of large language models (LLMs). This feature allows users to perform a new task without updating the model. Concretely, users can address tasks during the inference time by conditioning on a few input-label pair demonstrations along with the test input. It is different than the conventional fine-tuning paradigm and offers more flexibility. However, this capability also introduces potential issues. For example, users may use the model on any data without restriction, such as performing tasks with improper or sensitive content, which might violate the model policy or conflict with the model owner's interests. As a model owner, it is crucial to establish a mechanism to control the model's behavior under ICL, depending on the model owner's requirements for various content. To this end, we introduce the concept of "applicability authorization" tailored for LLMs, particularly for ICL behavior, and propose a simple approach, ICLGuard. It is a fine-tuning framework designed to allow the model owner to regulate ICL behavior on different data. ICLGuard preserves the original LLM and fine-tunes only a minimal set of additional trainable parameters to "guard" the LLM. Empirical results show that the guarded LLM can deactivate its ICL ability on target data without affecting its ICL ability on other data and its general functionality across all data.

翻译：上下文学习（ICL）是大型语言模型（LLM）能力的最新进展。该特性允许用户在不更新模型的情况下执行新任务。具体而言，用户可在推理阶段通过基于少量输入-标签对示例及测试输入的条件化处理来完成特定任务。这与传统的微调范式不同，提供了更高的灵活性。然而，这种能力也带来了潜在问题。例如，用户可能无限制地在任意数据上使用模型，包括处理不当或敏感内容的任务，这可能违反模型政策或与模型所有者的利益产生冲突。作为模型所有者，建立一种根据其对不同内容的要求来控制模型ICL行为的机制至关重要。为此，我们提出了专为LLM（特别是ICL行为）定制的“适用性授权”概念，并提出了一种简单方法——ICLGuard。这是一个微调框架，旨在使模型所有者能够调控模型在不同数据上的ICL行为。ICLGuard保留原始LLM参数，仅微调极少量可训练参数来“守护”LLM。实验结果表明，受守护的LLM能够在目标数据上停用其ICL能力，同时不影响其在其他数据上的ICL能力及在所有数据上的通用功能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日