Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models

The rapid development of large language models (LLMs) has yielded impressive success in various downstream tasks. However, the vast potential and remarkable capabilities of LLMs also raise new security and privacy concerns if they are exploited for nefarious purposes due to their open-endedness. For example, LLMs may be used to plagiarize or imitate writing, thereby infringing the copyright of the original content, or to create indiscriminate fake information based on a certain source text. In some cases, LLMs can even analyze text from the Internet to infer personal privacy. Unfortunately, previous text protection research could not foresee the emergence of powerful LLMs, rendering it no longer effective in this new context. To bridge this gap, we introduce Silent Guardian (SG), a text protection mechanism against LLMs, which allows LLMs to refuse to generate response when receiving protected text, preventing the malicious use of text from the source. Specifically, we first propose the concept of Truncation Protection Examples (TPE). By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction. In addition, to efficiently construct TPE in the discrete space of text data, we propose a novel optimization algorithm called Super Taliored Protection (STP), which is not only highly efficient but also maintains the semantic consistency of the text during the optimization process. The comprehensive experimental evaluation demonstrates that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases. Notably, SG also exhibits relatively good transferability and robustness, making its application in practical scenarios possible.

翻译：大型语言模型的快速发展在各种下游任务中取得了显著成功。然而，其广泛潜力与卓越能力也因其开放性而引发新的安全与隐私担忧——例如，LLMs可能被用于剽窃或模仿写作，侵犯原创内容的版权，或基于特定源文本制造无差别虚假信息。在某些情况下，LLMs甚至能分析互联网文本以推断个人隐私。遗憾的是，先前的文本保护研究未能预见到强大LLMs的出现，导致其在当前场景下不再有效。为弥补这一缺憾，我们提出"无声守护者"——一种针对LLMs的文本保护机制，可使LLMs在接收到受保护文本时拒绝生成回应，从而从源头阻止文本的恶意使用。具体而言，我们首先提出截断保护样本（TPE）的概念。通过对需保护文本进行细致修改，TPE能诱导LLMs优先采样结束标记，直接终止交互。此外，为在文本数据的离散空间中高效构建TPE，我们提出一种名为"超定制保护"的新型优化算法，该算法不仅具有极高效率，还能在优化过程中保持文本的语义一致性。全面实验评估表明，SG可在多种配置下有效保护目标文本，部分情形下近乎实现100%的保护成功率。值得注意的是，SG还展现出良好的可迁移性与鲁棒性，使其在实际场景中的应用成为可能。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日