Automated penetration testing (AutoPT) based on reinforcement learning (RL) has proven its ability to improve the efficiency of vulnerability identification in information systems. However, RL-based PT encounters several challenges, including poor sampling efficiency, intricate reward specification, and limited interpretability. To address these issues, we propose a knowledge-informed AutoPT framework called DRLRM-PT, which leverages reward machines (RMs) to encode domain knowledge as guidelines for training a PT policy. In our study, we specifically focus on lateral movement as a PT case study and formulate it as a partially observable Markov decision process (POMDP) guided by RMs. We design two RMs based on the MITRE ATT\&CK knowledge base for lateral movement. To solve the POMDP and optimize the PT policy, we employ the deep Q-learning algorithm with RM (DQRM). The experimental results demonstrate that the DQRM agent exhibits higher training efficiency in PT compared to agents without knowledge embedding. Moreover, RMs encoding more detailed domain knowledge demonstrated better PT performance compared to RMs with simpler knowledge.
翻译:基于强化学习(RL)的自动渗透测试(AutoPT)已被证明能够有效提升信息系统漏洞识别的效率。然而,基于强化学习的渗透测试面临若干挑战,包括采样效率低下、奖励机制设计复杂以及可解释性有限。为解决这些问题,我们提出了一种名为DRLRM-PT的知识驱动自动渗透测试框架,该框架利用奖励机(RMs)将领域知识编码为训练渗透测试策略的指导准则。在本研究中,我们以横向移动作为渗透测试的案例研究,并将其建模为由奖励机引导的部分可观测马尔可夫决策过程(POMDP)。基于MITRE ATT&CK知识库中关于横向移动的知识,我们设计了两种奖励机。为求解该POMDP并优化渗透测试策略,我们采用了结合奖励机的深度Q学习算法(DQRM)。实验结果表明,与未嵌入知识的智能体相比,DQRM智能体在渗透测试中表现出更高的训练效率。此外,编码更详细领域知识的奖励机相较于知识结构更简单的奖励机,展现出更优的渗透测试性能。