In order to assess the risks of a network system, it is important to investigate the behaviors of attackers after successful exploitation, which is called post-exploitation. Although there are various efficient tools supporting post-exploitation implementation, no application can automate this process. Most of the steps of this process are completed by experts who have profound knowledge of security, known as penetration testers or pen-testers. To this end, our study proposes the Raij\=u framework, a Reinforcement Learning (RL)-driven automation approach that assists pen-testers in quickly implementing the process of post-exploitation for security-level evaluation in network systems. We implement two RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to train specialized agents capable of making intelligent actions, which are Metasploit modules to automatically launch attacks of privileges escalation, gathering hashdump, and lateral movement. By leveraging RL, we aim to empower these agents with the ability to autonomously select and execute actions that can exploit vulnerabilities in target systems. This approach allows us to automate certain aspects of the penetration testing workflow, making it more efficient and responsive to emerging threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents automatically select actions and launch attacks on the environments and achieve over 84\% of successful attacks with under 55 attack steps given. Moreover, the A2C algorithm has proved extremely effective in the selection of proper actions for automation of post-exploitation.
翻译:为了评估网络系统的风险,研究攻击者在成功渗透后的行为(即后渗透阶段)至关重要。尽管已有多种高效工具支持后渗透实施,但目前尚无应用程序能够实现该过程的自动化。该过程的大部分步骤仍需由具备深厚安全知识的专家(即渗透测试人员)完成。为此,本研究提出雷兽框架,这是一种基于强化学习的自动化方法,旨在协助渗透测试人员快速实施网络系统安全等级评估中的后渗透流程。我们采用两种强化学习算法——优势演员-评论家算法和近端策略优化算法——训练专用智能体,使其能够做出智能决策:通过调用Metasploit模块自动发起权限提升、哈希转储收集和横向移动攻击。通过引入强化学习,我们致力于使这些智能体具备自主选择并执行攻击动作的能力,从而利用目标系统的漏洞。该方法可自动化渗透测试流程中的特定环节,使其更高效地应对新兴威胁与漏洞。实验在四个真实环境中进行,经过数千轮训练后,智能体能够自主选择动作并发动攻击,在不超过55个攻击步骤的条件下,成功完成84%以上的攻击任务。此外,优势演员-评论家算法在后渗透自动化操作选择中展现出极高的有效性。