Recent research has shown that the integration of Reinforcement Learning (RL) with Moving Target Defense (MTD) can enhance cybersecurity in Internet-of-Things (IoT) devices. Nevertheless, the practicality of existing work is hindered by data privacy concerns associated with centralized data processing in RL, and the unsatisfactory time needed to learn right MTD techniques that are effective against a rising number of heterogeneous zero-day attacks. Thus, this work presents CyberForce, a framework that combines Federated and Reinforcement Learning (FRL) to collaboratively and privately learn suitable MTD techniques for mitigating zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been deployed and evaluated in a scenario consisting of ten physical devices of a real IoT platform affected by heterogeneous malware samples. A pool of experiments has demonstrated that CyberForce learns the MTD technique mitigating each attack faster than existing RL-based centralized approaches. In addition, when various devices are exposed to different attacks, CyberForce benefits from knowledge transfer, leading to enhanced performance and reduced learning time in comparison to recent works. Finally, different aggregation algorithms used during the agent learning process provide CyberForce with notable robustness to malicious attacks.
翻译:近期研究表明,将强化学习(RL)与移动目标防御(MTD)相结合能够增强物联网(IoT)设备的网络安全。然而,现有工作的实用性受到两方面制约:一是RL中集中式数据处理带来的数据隐私问题,二是学习能够有效应对日益增多的异构零日攻击的正确MTD技术所需时间过长。为此,本文提出CyberForce框架,该框架结合联邦学习与强化学习(FRL),以协作且隐私保护的方式学习适用于缓解零日攻击的MTD技术。CyberForce集成设备指纹识别与异常检测技术,对基于FRL的智能体所选择的MTD机制进行奖励或惩罚。该框架已在一个由真实物联网平台的十台物理设备组成的场景中部署并评估,这些设备受到异构恶意软件样本的影响。一系列实验表明,与现有的基于RL的集中式方法相比,CyberForce能够更快地学习到缓解各类攻击的MTD技术。此外,当不同设备面临不同攻击时,CyberForce能够通过知识迁移获得优势,与近期研究工作相比,其性能得到提升且学习时间缩短。最后,在智能体学习过程中采用的不同聚合算法使CyberForce对恶意攻击具有显著的鲁棒性。