We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabilities: automated social engineering, scaling manual offensive cyber operations, and autonomous offensive cyber operations. In this paper we discuss applying these benchmarks to the Llama 3 models and a suite of contemporaneous state-of-the-art LLMs, enabling us to contextualize risks both with and without mitigations in place.
翻译:我们发布了一套新的面向大型语言模型(LLM)的安全基准测试套件——CYBERSECEVAL 3,以持续推动关于实证衡量LLM网络安全风险与能力的讨论。CYBERSECEVAL 3评估了涵盖两大类的8种不同风险:对第三方的风险,以及对应用程序开发者和最终用户的风险。与先前工作相比,我们新增了专注于攻击性安全能力的评估领域:自动化社会工程学、规模化手动攻击性网络操作以及自主攻击性网络操作。本文讨论了如何将这些基准应用于Llama 3系列模型及一组同期最先进的LLM,使我们能够在有缓解措施和无缓解措施两种情况下,对相关风险进行背景化分析。