Genshin: General Shield for Natural Language Processing with Large Language Models

Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains like financial fraud, phishing, etc. Current approaches mainly rely on traditional textual classification with posterior interpretable algorithms, suffering from attackers who may create versatile adversarial samples to break the system's defense, forcing users to make trade-offs between efficiency and robustness. To address this issue, we propose a novel cascading framework called Genshin (General Shield for Natural Language Processing with Large Language Models), utilizing LLMs as defensive one-time plug-ins. Unlike most applications of LLMs that try to transform text into something new or structural, Genshin uses LLMs to recover text to its original state. Genshin aims to combine the generalizability of the LLM, the discrimination of the median model, and the interpretability of the simple model. Our experiments on the task of sentimental analysis and spam detection have shown fatal flaws of the current median models and exhilarating results on LLMs' recovery ability, demonstrating that Genshin is both effective and efficient. In our ablation study, we unearth several intriguing observations. Utilizing the LLM defender, a tool derived from the 4th paradigm, we have reproduced BERT's 15% optimal mask rate results in the 3rd paradigm of NLP. Additionally, when employing the LLM as a potential adversarial tool, attackers are capable of executing effective attacks that are nearly semantically lossless.

翻译：近期，以ChatGPT、Gemini或LLaMA为代表的大型语言模型（LLMs）已成为研究热点，在众多领域展现出显著的进步与泛化能力。然而，LLMs构成了一个更为庞大的黑箱系统，加剧了模型的不透明性，其可解释性仅限于少数方法。LLMs固有的不确定性与不透明性限制了其在金融欺诈、网络钓鱼等高风险领域的应用。现有方法主要依赖传统文本分类与后验可解释算法，易受攻击者制造多样化对抗样本以突破系统防御，迫使用户在效率与鲁棒性之间进行权衡。为解决此问题，我们提出一种名为Genshin（面向大型语言模型的自然语言处理通用防护框架）的新型级联框架，将LLMs作为一次性防御插件使用。与大多数试图将文本转化为新形式或结构化表示的LLMs应用不同，Genshin利用LLMs将文本恢复至原始状态。该框架旨在融合LLMs的泛化能力、中间模型的判别能力以及简单模型的可解释性。我们在情感分析与垃圾邮件检测任务上的实验揭示了当前中间模型的致命缺陷，同时展现了LLMs文本恢复能力的振奋人心成果，证明Genshin兼具高效性与有效性。在消融研究中，我们发现了若干值得关注的现象：利用源自第四范式的LLM防御器，我们在NLP第三范式中复现了BERT的15%最优掩码率结果；此外，当将LLM作为潜在对抗工具时，攻击者能够实施语义损失近乎为零的有效攻击。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日