Raijū: Reinforcement Learning-Guided Post-Exploitation for Automating Security Assessment of Network Systems

In order to assess the risks of a network system, it is important to investigate the behaviors of attackers after successful exploitation, which is called post-exploitation. Although there are various efficient tools supporting post-exploitation implementation, no application can automate this process. Most of the steps of this process are completed by experts who have profound knowledge of security, known as penetration testers or pen-testers. To this end, our study proposes the Raij\=u framework, a Reinforcement Learning (RL)-driven automation approach that assists pen-testers in quickly implementing the process of post-exploitation for security-level evaluation in network systems. We implement two RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to train specialized agents capable of making intelligent actions, which are Metasploit modules to automatically launch attacks of privileges escalation, gathering hashdump, and lateral movement. By leveraging RL, we aim to empower these agents with the ability to autonomously select and execute actions that can exploit vulnerabilities in target systems. This approach allows us to automate certain aspects of the penetration testing workflow, making it more efficient and responsive to emerging threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents automatically select actions and launch attacks on the environments and achieve over 84\% of successful attacks with under 55 attack steps given. Moreover, the A2C algorithm has proved extremely effective in the selection of proper actions for automation of post-exploitation.

翻译：为了评估网络系统的风险，研究攻击者在成功渗透后的行为（即后渗透阶段）至关重要。尽管已有多种高效工具支持后渗透实施，但目前尚无应用程序能够实现该过程的自动化。该过程的大部分步骤仍需由具备深厚安全知识的专家（即渗透测试人员）完成。为此，本研究提出雷兽框架，这是一种基于强化学习的自动化方法，旨在协助渗透测试人员快速实施网络系统安全等级评估中的后渗透流程。我们采用两种强化学习算法——优势演员-评论家算法和近端策略优化算法——训练专用智能体，使其能够做出智能决策：通过调用Metasploit模块自动发起权限提升、哈希转储收集和横向移动攻击。通过引入强化学习，我们致力于使这些智能体具备自主选择并执行攻击动作的能力，从而利用目标系统的漏洞。该方法可自动化渗透测试流程中的特定环节，使其更高效地应对新兴威胁与漏洞。实验在四个真实环境中进行，经过数千轮训练后，智能体能够自主选择动作并发动攻击，在不超过55个攻击步骤的条件下，成功完成84%以上的攻击任务。此外，优势演员-评论家算法在后渗透自动化操作选择中展现出极高的有效性。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日