With the capability to write convincing and fluent natural language and generate code, Foundation Models present dual-use concerns broadly and within the cyber domain specifically. Generative AI has already begun to impact cyberspace through a broad illicit marketplace for assisting malware development and social engineering attacks through hundreds of malicious-AI-as-a-services tools. More alarming is that recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations. However, these previous investigations primarily focused on the threats posed by proprietary models due to the until recent lack of strong open-weight model and additionally leave the impacts of network defenses or potential countermeasures unexplored. Critically, understanding the aptitude of downloadable models to function as offensive cyber agents is vital given that they are far more difficult to govern and prevent their misuse. As such, this work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks. Using target machines from a commercial provider, the most recently released downloadable models are found to be on par with a leading proprietary model at conducting simple cyber attacks with common hacking tools against known vulnerabilities. To mitigate such LLM-powered threats, defensive prompt injection (DPI) payloads for disrupting the malicious cyber agent's workflow are demonstrated to be effective. From these results, the implications for AI safety and governance with respect to cybersecurity is analyzed.
翻译:基础模型具备生成流畅自然语言和编写代码的能力,在广泛领域特别是网络领域引发了双重用途的担忧。生成式人工智能已通过协助恶意软件开发的非法市场和数百种恶意AI即服务工具,开始对网络空间产生影响。更令人担忧的是,近期研究表明这些先进模型可能参与甚至独立执行攻击性网络空间行动。然而,由于此前缺乏强大的开源权重模型,先前研究主要关注专有模型构成的威胁,且未深入探讨网络防御措施或潜在应对策略的影响。鉴于可下载模型更难以监管和防止滥用,理解其作为攻击性网络代理的能力至关重要。为此,本研究评估了多种前沿基础模型在隔离网络中入侵计算机的能力,并研究了抵御此类人工智能驱动攻击的防御机制。通过使用商业服务商提供的目标机测试,发现最新发布的可下载模型在利用常见黑客工具针对已知漏洞实施简单网络攻击方面,已与领先的专有模型性能相当。为缓解此类大语言模型驱动的威胁,研究证明通过防御性提示注入有效载荷干扰恶意网络代理工作流程具有显著效果。基于这些结果,本文进一步分析了人工智能安全与治理在网络安防领域的相关影响。