Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework

While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience. Reinforcement Learning (RL)-based defense strategies have been developed to optimize resource deployment and isolation policies under adversarial conditions, aiming to enhance system resilience by maintaining and restoring network availability. However, existing approaches lack robustness as they require retraining to adapt to dynamic changes in network structure, node scale, attack strategies, and attack intensity. Furthermore, the lack of Human-in-the-Loop (HITL) support limits interpretability and flexibility. To address these limitations, we propose CyberOps-Bots, a hierarchical multi-agent reinforcement learning framework empowered by Large Language Models (LLMs). Inspired by MITRE ATT&CK's Tactics-Techniques model, CyberOps-Bots features a two-layer architecture: (1) An upper-level LLM agent with four modules--ReAct planning, IPDRR-based perception, long-short term memory, and action/tool integration--performs global awareness, human intent recognition, and tactical planning; (2) Lower-level RL agents, developed via heterogeneous separated pre-training, execute atomic defense actions within localized network regions. This synergy preserves LLM adaptability and interpretability while ensuring reliable RL execution. Experiments on real cloud datasets show that, compared to state-of-the-art algorithms, CyberOps-Bots maintains network availability 68.5% higher and achieves a 34.7% jumpstart performance gain when shifting the scenarios without retraining. To our knowledge, this is the first study to establish a robust LLM-RL framework with HITL support for cloud defense. We will release our framework to the community, facilitating the advancement of robust and autonomous defense in cloud networks.

翻译：尽管虚拟化与资源池化技术赋予云网络结构灵活性与弹性伸缩能力，但其不可避免地扩大了攻击面，并对网络韧性构成挑战。基于强化学习的防御策略已被开发用于优化对抗条件下的资源部署与隔离策略，旨在通过维持与恢复网络可用性来增强系统韧性。然而，现有方法缺乏鲁棒性，因其需要重新训练以适应网络结构、节点规模、攻击策略与攻击强度的动态变化。此外，缺乏人在回路支持限制了方法的可解释性与灵活性。为应对这些局限，我们提出CyberOps-Bots——一种由大语言模型赋能的层级多智能体强化学习框架。受MITRE ATT&CK战术-技术模型启发，CyberOps-Bots采用双层架构：（1）上层LLM智能体包含四个模块——ReAct规划、基于IPDRR的感知、长短时记忆以及行动/工具集成——负责全局态势感知、人类意图识别与战术规划；（2）下层RL智能体通过异构分离预训练开发，在局部网络区域内执行原子级防御动作。这种协同机制在保持LLM适应性与可解释性的同时，确保了RL执行的可靠性。在真实云数据集上的实验表明，与最先进算法相比，CyberOps-Bots在网络可用性上保持高出68.5%的水平，并在场景切换无需重训练的情况下实现了34.7%的跳跃式性能增益。据我们所知，这是首个构建具有人在回路支持的鲁棒LLM-RL框架用于云防御的研究。我们将向社区开源此框架，以推动云网络中鲁棒自主防御技术的发展。