CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher

Penetration testing, a critical component of cybersecurity, typically requires extensive time and effort to find vulnerabilities. Beginners in this field often benefit from collaborative approaches with the community or experts. To address this, we develop CIPHER (Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers), a large language model specifically trained to assist in penetration testing tasks. We trained CIPHER using over 300 high-quality write-ups of vulnerable machines, hacking techniques, and documentation of open-source penetration testing tools. Additionally, we introduced the Findings, Action, Reasoning, and Results (FARR) Flow augmentation, a novel method to augment penetration testing write-ups to establish a fully automated pentesting simulation benchmark tailored for large language models. This approach fills a significant gap in traditional cybersecurity Q\&A benchmarks and provides a realistic and rigorous standard for evaluating AI's technical knowledge, reasoning capabilities, and practical utility in dynamic penetration testing scenarios. In our assessments, CIPHER achieved the best overall performance in providing accurate suggestion responses compared to other open-source penetration testing models of similar size and even larger state-of-the-art models like Llama 3 70B and Qwen1.5 72B Chat, particularly on insane difficulty machine setups. This demonstrates that the current capabilities of general LLMs are insufficient for effectively guiding users through the penetration testing process. We also discuss the potential for improvement through scaling and the development of better benchmarks using FARR Flow augmentation results. Our benchmark will be released publicly at https://github.com/ibndias/CIPHER.

翻译：渗透测试作为网络安全的关键组成部分，通常需要投入大量时间和精力来发现漏洞。该领域的初学者往往受益于与社区或专家的协作方法。为此，我们开发了CIPHER（面向道德研究人员的网络安全智能渗透测试助手），这是一个专门训练用于协助渗透测试任务的大语言模型。我们使用超过300份关于易受攻击机器的高质量报告、黑客技术以及开源渗透测试工具的文档对CIPHER进行了训练。此外，我们引入了"发现、行动、推理与结果"（FARR）流程增强方法，这是一种新颖的技术，用于增强渗透测试报告，从而建立一个专为大语言模型设计的全自动渗透测试模拟基准。该方法填补了传统网络安全问答基准的显著空白，并为评估人工智能在动态渗透测试场景中的技术知识、推理能力和实际效用提供了一个现实且严格的标准。在我们的评估中，与类似规模的其他开源渗透测试模型以及更大的先进模型（如Llama 3 70B和Qwen1.5 72B Chat）相比，CIPHER在提供准确建议响应方面取得了最佳综合性能，尤其是在极高难度机器设置上。这表明当前通用大语言模型的能力不足以有效指导用户完成渗透测试过程。我们还讨论了通过模型扩展以及利用FARR流程增强结果开发更优基准的改进潜力。我们的基准将在https://github.com/ibndias/CIPHER公开发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日