Recent research has shown that the integration of Reinforcement Learning (RL) with Moving Target Defense (MTD) can enhance cybersecurity in Internet-of-Things (IoT) devices. Nevertheless, the practicality of existing work is hindered by data privacy concerns associated with centralized data processing in RL, and the unsatisfactory time needed to learn right MTD techniques that are effective against a rising number of heterogeneous zero-day attacks. Thus, this work presents CyberForce, a framework that combines Federated and Reinforcement Learning (FRL) to collaboratively and privately learn suitable MTD techniques for mitigating zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been deployed and evaluated in a scenario consisting of ten physical devices of a real IoT platform affected by heterogeneous malware samples. A pool of experiments has demonstrated that CyberForce learns the MTD technique mitigating each attack faster than existing RL-based centralized approaches. In addition, when various devices are exposed to different attacks, CyberForce benefits from knowledge transfer, leading to enhanced performance and reduced learning time in comparison to recent works. Finally, different aggregation algorithms used during the agent learning process provide CyberForce with notable robustness to malicious attacks.
翻译:近期研究表明,将强化学习与移动目标防御技术相结合,可增强物联网设备的网络安全。然而,现有研究的实用性因强化学习中集中式数据处理带来的数据隐私问题,以及学习能有效应对日益增长的异构零日攻击的合适移动目标防御技术所需时间过长而受到限制。为此,本文提出CyberForce框架,该框架融合联邦学习与强化学习,以协作且隐私保护的方式学习适用于缓解零日攻击的移动目标防御技术。CyberForce通过集成设备指纹识别与异常检测机制,对基于联邦强化学习代理选择的移动目标防御策略进行奖励或惩罚。该框架已在实际物联网平台中由十台受异构恶意软件样本影响的物理设备构成的场景中部署与评估。实验表明,与现有的基于强化学习的集中式方法相比,CyberForce能更快学习到缓解各攻击的移动目标防御技术。此外,当不同设备面临不同攻击时,CyberForce受益于知识迁移,相比近期研究展现出更强的性能与更短的学习时间。最后,代理学习过程中采用的不同聚合算法使CyberForce对恶意攻击具有显著的鲁棒性。