Ensuring the security of released large language models (LLMs) poses a significant dilemma, as existing mechanisms either compromise ownership rights or raise data privacy concerns. To address this dilemma, we introduce TaylorMLP to protect the ownership of released LLMs and prevent their abuse. Specifically, TaylorMLP preserves the ownership of LLMs by transforming the weights of LLMs into parameters of Taylor-series. Instead of releasing the original weights, developers can release the Taylor-series parameters with users, thereby ensuring the security of LLMs. Moreover, TaylorMLP can prevent abuse of LLMs by adjusting the generation speed. It can induce low-speed token generation for the protected LLMs by increasing the terms in the Taylor-series. This intentional delay helps LLM developers prevent potential large-scale unauthorized uses of their models. Empirical experiments across five datasets and three LLM architectures demonstrate that TaylorMLP induces over 4x increase in latency, producing the tokens precisely matched with original LLMs. Subsequent defensive experiments further confirm that TaylorMLP effectively prevents users from reconstructing the weight values based on downstream datasets.
翻译:确保已发布大语言模型(LLMs)的安全性面临显著困境,现有机制要么损害所有权,要么引发数据隐私担忧。为解决此困境,我们提出TaylorMLP以保护已发布LLMs的所有权并防止其滥用。具体而言,TaylorMLP通过将LLMs的权重转换为泰勒级数参数来维护所有权。开发者可向用户发布泰勒级数参数而非原始权重,从而保障LLMs的安全性。此外,TaylorMLP可通过调整生成速度防止LLMs滥用——通过增加泰勒级数项数,可使受保护LLMs产生低速令牌生成。这种人为延迟有助于LLM开发者预防潜在的大规模未授权使用。在五个数据集和三种LLM架构上的实证实验表明,TaylorMLP可导致延迟增加超过4倍,同时生成与原始LLMs完全匹配的令牌。后续防御实验进一步证实,TaylorMLP能有效阻止用户基于下游数据集重建权重数值。