A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks

The widespread adoption of Large Language Models (LLMs) has revolutionized AI deployment, enabling autonomous and semi-autonomous applications across industries through intuitive language interfaces and continuous improvements in model development. However, the attendant increase in autonomy and expansion of access permissions among AI applications also make these systems compelling targets for malicious attacks. Their inherent susceptibility to security flaws necessitates robust defenses, yet no known approaches can prevent zero-day or novel attacks against LLMs. This places AI protection systems in a category similar to established malware protection systems: rather than providing guaranteed immunity, they minimize risk through enhanced observability, multi-layered defense, and rapid threat response, supported by a threat intelligence function designed specifically for AI-related threats. Prior work on LLM protection has largely evaluated individual detection models rather than end-to-end systems designed for continuous, rapid adaptation to a changing threat landscape. We present a production-grade defense system rooted in established malware detection and threat intelligence practices. Our platform integrates three components: a threat intelligence system that turns emerging threats into protections; a data platform that aggregates and enriches information while providing observability, monitoring, and ML operations; and a release platform enabling safe, rapid detection updates without disrupting customer workflows. Together, these components deliver layered protection against evolving LLM threats while generating training data for continuous model improvement and deploying updates without interrupting production.

翻译：大型语言模型（LLMs）的广泛采用已彻底改变了人工智能的部署方式，通过直观的语言接口和模型开发的持续改进，实现了跨行业的自主和半自主应用。然而，人工智能应用自主性的相应提升及访问权限的扩大，也使这些系统成为恶意攻击的诱人目标。其固有的安全漏洞易感性要求建立强大的防御机制，但目前尚无已知方法能够完全防范针对LLMs的零日攻击或新型攻击。这使得人工智能防护系统与成熟的恶意软件防护系统归于同类：它们并非提供绝对免疫，而是通过增强的可观测性、多层防御和快速威胁响应来最小化风险，并辅以专门针对人工智能相关威胁设计的威胁情报功能。先前关于LLM防护的研究主要评估独立的检测模型，而非为持续快速适应不断变化的威胁态势而设计的端到端系统。我们提出一个植根于成熟恶意软件检测与威胁情报实践的生产级防御系统。该平台集成三大组件：将新兴威胁转化为防护措施的威胁情报系统；聚合并丰富信息、同时提供可观测性、监控和机器学习运维的数据平台；以及在不干扰客户工作流程的前提下实现安全快速检测更新的发布平台。这些组件共同构建了针对持续演变的LLM威胁的分层防护体系，在为持续模型改进生成训练数据的同时，实现不影响生产环境的更新部署。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日