In November 2025, the authors ran a workshop on the topic of what makes a good reinforcement learning (RL) environment for autonomous cyber defence (ACD). This paper details the knowledge shared by participants both during the workshop and shortly afterwards by contributing herein. The workshop participants come from academia, industry, and government, and have extensive hands-on experience designing and working with RL and cyber environments. While there is now a sizeable body of literature describing work in RL for ACD, there is nevertheless a great deal of tradecraft, domain knowledge, and common hazards which are not detailed comprehensively in a single resource. With a specific focus on building better environments to train and evaluate autonomous RL agents in network defence scenarios, including government and critical infrastructure networks, the contributions of this work are twofold: (1) a framework for decomposing the interface between RL cyber environments and real systems, and (2) guidelines on current best practice for RL-based ACD environment development and agent evaluation, based on the key findings from our workshop.
翻译:2025年11月,作者举办了一场研讨会,主题是如何为自主网络防御(ACD)构建良好的强化学习(RL)环境。本文详细记录了与会者在研讨会期间及会后贡献的知识。与会者来自学术界、工业界和政府机构,在设计和运用RL及网络环境方面拥有丰富的实践经验。尽管目前已有大量文献描述RL在ACD中的应用,但仍有大量工艺诀窍、领域知识和常见风险未在单一资源中全面阐述。本文聚焦于构建更优环境,以训练和评估网络防御场景(包括政府和关键基础设施网络)中的自主RL智能体,其贡献包含两方面:(1)提出一个用于分解RL网络环境与真实系统之间接口的框架;(2)基于研讨会关键发现,提供当前基于RL的ACD环境开发与智能体评估的最佳实践指南。