Countering Autonomous Cyber Threats

With the capability to write convincing and fluent natural language and generate code, Foundation Models present dual-use concerns broadly and within the cyber domain specifically. Generative AI has already begun to impact cyberspace through a broad illicit marketplace for assisting malware development and social engineering attacks through hundreds of malicious-AI-as-a-services tools. More alarming is that recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations. However, these previous investigations primarily focused on the threats posed by proprietary models due to the until recent lack of strong open-weight model and additionally leave the impacts of network defenses or potential countermeasures unexplored. Critically, understanding the aptitude of downloadable models to function as offensive cyber agents is vital given that they are far more difficult to govern and prevent their misuse. As such, this work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks. Using target machines from a commercial provider, the most recently released downloadable models are found to be on par with a leading proprietary model at conducting simple cyber attacks with common hacking tools against known vulnerabilities. To mitigate such LLM-powered threats, defensive prompt injection (DPI) payloads for disrupting the malicious cyber agent's workflow are demonstrated to be effective. From these results, the implications for AI safety and governance with respect to cybersecurity is analyzed.

翻译：基础模型具备生成流畅自然语言和编写代码的能力，在广泛领域特别是网络领域引发了双重用途的担忧。生成式人工智能已通过协助恶意软件开发的非法市场和数百种恶意AI即服务工具，开始对网络空间产生影响。更令人担忧的是，近期研究表明这些先进模型可能参与甚至独立执行攻击性网络空间行动。然而，由于此前缺乏强大的开源权重模型，先前研究主要关注专有模型构成的威胁，且未深入探讨网络防御措施或潜在应对策略的影响。鉴于可下载模型更难以监管和防止滥用，理解其作为攻击性网络代理的能力至关重要。为此，本研究评估了多种前沿基础模型在隔离网络中入侵计算机的能力，并研究了抵御此类人工智能驱动攻击的防御机制。通过使用商业服务商提供的目标机测试，发现最新发布的可下载模型在利用常见黑客工具针对已知漏洞实施简单网络攻击方面，已与领先的专有模型性能相当。为缓解此类大语言模型驱动的威胁，研究证明通过防御性提示注入有效载荷干扰恶意网络代理工作流程具有显著效果。基于这些结果，本文进一步分析了人工智能安全与治理在网络安防领域的相关影响。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日