Turning Generative Models Degenerate: The Power of Data Poisoning Attacks

The increasing use of large language models (LLMs) trained by third parties raises significant security concerns. In particular, malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. While such attacks have been extensively studied in image domains and classification tasks, they remain underexplored for natural language generation (NLG) tasks. To address this gap, we conduct an investigation of various poisoning techniques targeting the LLM's fine-tuning phase via prefix-tuning, a Parameter Efficient Fine-Tuning (PEFT) method. We assess their effectiveness across two generative tasks: text summarization and text completion; and we also introduce new metrics to quantify the success and stealthiness of such NLG poisoning attacks. Through our experiments, we find that the prefix-tuning hyperparameters and trigger designs are the most crucial factors to influence attack success and stealthiness. Moreover, we demonstrate that existing popular defenses are ineffective against our poisoning attacks. Our study presents the first systematic approach to understanding poisoning attacks targeting NLG tasks during fine-tuning via PEFT across a wide range of triggers and attack settings. We hope our findings will aid the AI security community in developing effective defenses against such threats.

翻译：第三方训练的大型语言模型日益广泛的应用引发了严重的安全担忧。具体而言，恶意行为者可通过投毒攻击植入后门，以产生不良输出。尽管此类攻击在图像领域和分类任务中已得到广泛研究，但在自然语言生成任务中仍缺乏深入探索。为填补这一空白，我们研究了针对LLM通过前缀调优这一参数高效微调方法进行微调阶段的各种投毒技术。我们在文本摘要和文本补全两项生成任务中评估了这些技术的有效性，并引入了新的量化指标来衡量此类NLG投毒攻击的成功率与隐蔽性。实验表明，前缀调优的超参数和触发器设计是影响攻击成功率与隐蔽性的最关键因素。此外，我们证明现有的主流防御方法对我们的投毒攻击均无效。本研究首次系统性地探讨了在PEFT微调过程中针对NLG任务的投毒攻击，涵盖了广泛的触发器类型和攻击场景。我们希望研究结果能帮助AI安全社区开发有效的防御机制来应对此类威胁。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/