构建安全优先的生成式人工智能范式的行动倡议 (A Call to Action for a Secure-by-Design Generative AI Paradigm)

Large language models have gained widespread prominence, yet their vulnerability to prompt injection and other adversarial attacks remains a critical concern. This paper argues for a security-by-design AI paradigm that proactively mitigates LLM vulnerabilities while enhancing performance. To achieve this, we introduce PromptShield, an ontology-driven framework that ensures deterministic and secure prompt interactions. It standardizes user inputs through semantic validation, eliminating ambiguity and mitigating adversarial manipulation. To assess PromptShield's security and performance capabilities, we conducted an experiment on an agent-based system to analyze cloud logs within Amazon Web Services (AWS), containing 493 distinct events related to malicious activities and anomalies. By simulating prompt injection attacks and assessing the impact of deploying PromptShield, our results demonstrate a significant improvement in model security and performance, achieving precision, recall, and F1 scores of approximately 94%. Notably, the ontology-based framework not only mitigates adversarial threats but also enhances the overall performance and reliability of the system. Furthermore, PromptShield's modular and adaptable design ensures its applicability beyond cloud security, making it a robust solution for safeguarding generative AI applications across various domains. By laying the groundwork for AI safety standards and informing future policy development, this work stimulates a crucial dialogue on the pivotal role of deterministic prompt engineering and ontology-based validation in ensuring the safe and responsible deployment of LLMs in high-stakes environments.

翻译：大型语言模型已获得广泛关注，但其在提示注入和其他对抗性攻击面前的脆弱性仍是关键问题。本文主张一种安全优先的人工智能范式，该范式在提升性能的同时主动缓解大型语言模型的脆弱性。为实现此目标，我们提出了PromptShield，这是一个本体驱动的框架，可确保确定性和安全的提示交互。它通过语义验证标准化用户输入，消除歧义并减轻对抗性操纵。为评估PromptShield的安全与性能表现，我们在一个基于智能体的系统上进行了实验，以分析亚马逊网络服务（AWS）中的云日志，其中包含493个与恶意活动和异常相关的独立事件。通过模拟提示注入攻击并评估部署PromptShield的影响，我们的结果表明模型安全性和性能均有显著提升，精确率、召回率和F1分数均达到约94%。值得注意的是，基于本体的框架不仅减轻了对抗性威胁，还提升了系统的整体性能和可靠性。此外，PromptShield的模块化和可适应设计确保了其在云安全之外的适用性，使其成为跨领域保护生成式人工智能应用的稳健解决方案。通过为人工智能安全标准奠定基础并为未来政策制定提供参考，本研究激发了一场关键对话，探讨了确定性提示工程和基于本体的验证在确保大型语言模型于高风险环境中安全、负责任部署方面的核心作用。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日