The critical challenge of prompt injection attacks in Large Language Models (LLMs) integrated applications, a growing concern in the Artificial Intelligence (AI) field. Such attacks, which manipulate LLMs through natural language inputs, pose a significant threat to the security of these applications. Traditional defense strategies, including output and input filtering, as well as delimiter use, have proven inadequate. This paper introduces the 'Signed-Prompt' method as a novel solution. The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources. The paper presents a comprehensive analysis of prompt injection attack patterns, followed by a detailed explanation of the Signed-Prompt concept, including its basic architecture and implementation through both prompt engineering and fine-tuning of LLMs. Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks, thus validating its potential as a robust defense strategy in AI security.
翻译:大语言模型集成应用中提示注入攻击的关键挑战是人工智能领域日益增长的关注点。此类攻击通过自然语言输入操纵大语言模型,对这些应用的安全性构成重大威胁。包括输出过滤、输入过滤以及使用分隔符在内的传统防御策略已被证明效果不足。本文提出“Signed-Prompt”方法作为新颖解决方案。该方法由授权用户对命令段内的敏感指令进行签名,使大语言模型能够识别可信的指令来源。本文对提示注入攻击模式进行了全面分析,随后详细阐述了Signed-Prompt概念,包括其基本架构以及通过提示工程和大语言模型微调实现的方案。实验证明了Signed-Prompt方法的有效性,表明其对多种类型的提示注入攻击具有显著抵抗力,从而验证了其作为人工智能安全领域稳健防御策略的潜力。