Reinforcement Learning (RL) has shown great potential for autonomous decision-making in the cybersecurity domain, enabling agents to learn through direct environment interaction. However, RL agents in Autonomous Cyber Operations (ACO) typically learn from scratch, requiring them to execute undesirable actions to learn their consequences. In this study, we integrate external knowledge in the form of a Large Language Model (LLM) pretrained on cybersecurity data that our RL agent can directly leverage to make informed decisions. By guiding initial training with an LLM, we improve baseline performance and reduce the need for exploratory actions with obviously negative outcomes. We evaluate our LLM-integrated approach in a simulated cybersecurity environment, and demonstrate that our guided agent achieves over 2x higher rewards during early training and converges to a favorable policy approximately 4,500 episodes faster than the baseline.
翻译:强化学习(RL)在网络安全领域展现出自主决策的巨大潜力,使智能体能够通过直接与环境交互进行学习。然而,自主网络作战(ACO)中的强化学习智能体通常需要从零开始学习,这要求它们执行不良行动以了解其后果。在本研究中,我们引入了一种基于网络安全数据预训练的大型语言模型(LLM)作为外部知识源,使我们的强化学习智能体能够直接利用这些知识做出明智决策。通过使用大型语言模型引导初始训练,我们提升了基线性能,并减少了对具有明显负面结果的探索性行动的需求。我们在模拟网络安全环境中评估了这种融合大型语言模型的方法,结果表明,经过引导的智能体在早期训练阶段获得的奖励比基线高出2倍以上,并且比基线提前约4,500个训练回合收敛到更优策略。