Simulated environments have proven invaluable in Autonomous Cyber Operations (ACO) where Reinforcement Learning (RL) agents can be trained without the computational overhead of emulation. These environments must accurately represent cybersecurity scenarios while producing the necessary signals to support RL training. In this study, we present a framework where we first extend CybORG's Cage Challenge 2 environment by implementing three new actions: Patch, Isolate, and Unisolate, to better represent the capabilities available to human operators in real-world settings. We then propose a design for agent development where we modify the reward signals and the agent's feature space to enhance training performance. To validate these modifications, we train DQN and PPO agents in the updated environment. Our study demonstrates that CybORG can be extended with additional realistic functionality, while maintaining its ability to generate informative training signals for RL agents.
翻译:在自主网络攻防领域,仿真环境已被证明具有不可估量的价值,强化学习智能体可在其中接受训练,而无需承担模拟技术带来的计算开销。此类环境必须精确复现网络安全场景,同时生成支持强化学习训练的必要信号。本研究提出一个框架:首先通过实现三种新操作(补丁修复、隔离与解除隔离)来扩展CybORG的Cage Challenge 2环境,以更准确地反映实际环境中人类操作员可用的处置能力;随后提出一种智能体开发方案,通过修改奖励信号与智能体特征空间以提升训练性能。为验证这些改进,我们在更新后的环境中训练了DQN与PPO智能体。研究表明,CybORG在保持为强化学习智能体生成有效训练信号能力的同时,可通过扩展额外现实功能来增强其仿真效能。