LLM agents have the potential to revolutionize defensive cyber operations, but their offensive capabilities are not yet fully understood. To prepare for emerging threats, model developers and governments are evaluating the cyber capabilities of foundation models. However, these assessments often lack transparency and a comprehensive focus on offensive capabilities. In response, we introduce the Catastrophic Cyber Capabilities Benchmark (3CB), a novel framework designed to rigorously assess the real-world offensive capabilities of LLM agents. Our evaluation of modern LLMs on 3CB reveals that frontier models, such as GPT-4o and Claude 3.5 Sonnet, can perform offensive tasks such as reconnaissance and exploitation across domains ranging from binary analysis to web technologies. Conversely, smaller open-source models exhibit limited offensive capabilities. Our software solution and the corresponding benchmark provides a critical tool to reduce the gap between rapidly improving capabilities and robustness of cyber offense evaluations, aiding in the safer deployment and regulation of these powerful technologies.
翻译:LLM智能体有潜力彻底改变防御性网络行动,但其攻击能力尚未被完全理解。为应对新兴威胁,模型开发者和政府机构正在评估基础模型的网络能力。然而,这些评估往往缺乏透明度,且未全面聚焦于攻击能力。为此,我们提出了灾难性网络能力基准(3CB),这是一个旨在严格评估LLM智能体在真实世界中攻击能力的新颖框架。我们在3CB上对现代LLM的评估表明,前沿模型(如GPT-4o和Claude 3.5 Sonnet)能够执行从二进制分析到Web技术等多个领域的攻击任务,包括侦察与漏洞利用。相反,较小的开源模型则表现出有限的攻击能力。我们的软件解决方案及相应基准为缩小快速提升的攻击能力与网络攻击评估鲁棒性之间的差距提供了关键工具,有助于更安全地部署和监管这些强大技术。