Large language models have gained widespread prominence, yet their vulnerability to prompt injection and other adversarial attacks remains a critical concern. This paper argues for a security-by-design AI paradigm that proactively mitigates LLM vulnerabilities while enhancing performance. To achieve this, we introduce PromptShield, an ontology-driven framework that ensures deterministic and secure prompt interactions. It standardizes user inputs through semantic validation, eliminating ambiguity and mitigating adversarial manipulation. To assess PromptShield's security and performance capabilities, we conducted an experiment on an agent-based system to analyze cloud logs within Amazon Web Services (AWS), containing 493 distinct events related to malicious activities and anomalies. By simulating prompt injection attacks and assessing the impact of deploying PromptShield, our results demonstrate a significant improvement in model security and performance, achieving precision, recall, and F1 scores of approximately 94%. Notably, the ontology-based framework not only mitigates adversarial threats but also enhances the overall performance and reliability of the system. Furthermore, PromptShield's modular and adaptable design ensures its applicability beyond cloud security, making it a robust solution for safeguarding generative AI applications across various domains. By laying the groundwork for AI safety standards and informing future policy development, this work stimulates a crucial dialogue on the pivotal role of deterministic prompt engineering and ontology-based validation in ensuring the safe and responsible deployment of LLMs in high-stakes environments.
翻译:大型语言模型已获得广泛关注,但其在提示注入和其他对抗性攻击面前的脆弱性仍是关键问题。本文主张一种安全优先的人工智能范式,该范式在提升性能的同时主动缓解大型语言模型的脆弱性。为实现此目标,我们提出了PromptShield,这是一个本体驱动的框架,可确保确定性和安全的提示交互。它通过语义验证标准化用户输入,消除歧义并减轻对抗性操纵。为评估PromptShield的安全与性能表现,我们在一个基于智能体的系统上进行了实验,以分析亚马逊网络服务(AWS)中的云日志,其中包含493个与恶意活动和异常相关的独立事件。通过模拟提示注入攻击并评估部署PromptShield的影响,我们的结果表明模型安全性和性能均有显著提升,精确率、召回率和F1分数均达到约94%。值得注意的是,基于本体的框架不仅减轻了对抗性威胁,还提升了系统的整体性能和可靠性。此外,PromptShield的模块化和可适应设计确保了其在云安全之外的适用性,使其成为跨领域保护生成式人工智能应用的稳健解决方案。通过为人工智能安全标准奠定基础并为未来政策制定提供参考,本研究激发了一场关键对话,探讨了确定性提示工程和基于本体的验证在确保大型语言模型于高风险环境中安全、负责任部署方面的核心作用。