Recent research has shown that the integration of Reinforcement Learning (RL) with Moving Target Defense (MTD) can enhance cybersecurity in Internet-of-Things (IoT) devices. Nevertheless, the practicality of existing work is hindered by data privacy concerns associated with centralized data processing in RL, and the unsatisfactory time needed to learn right MTD techniques that are effective against a rising number of heterogeneous zero-day attacks. Thus, this work presents CyberForce, a framework that combines Federated and Reinforcement Learning (FRL) to collaboratively and privately learn suitable MTD techniques for mitigating zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been deployed and evaluated in a scenario consisting of ten physical devices of a real IoT platform affected by heterogeneous malware samples. A pool of experiments has demonstrated that CyberForce learns the MTD technique mitigating each attack faster than existing RL-based centralized approaches. In addition, when various devices are exposed to different attacks, CyberForce benefits from knowledge transfer, leading to enhanced performance and reduced learning time in comparison to recent works. Finally, different aggregation algorithms used during the agent learning process provide CyberForce with notable robustness to malicious attacks.
翻译:近期研究表明,将强化学习与移动目标防御技术相结合可增强物联网设备的安全性。然而,现有研究的实用性受到两方面制约:一是强化学习中集中式数据处理引发的数据隐私问题;二是针对日益增多的异构零日攻击,现有方法学习有效MTD技术所需时间过长。为此,本文提出CyberForce框架,该框架融合联邦学习与强化学习,通过协作式隐私保护机制学习适用于缓解零日攻击的MTD技术。CyberForce集成设备指纹识别与异常检测模块,对基于FRL的智能体所选择的MTD机制进行奖励或惩罚。该框架已在真实物联网平台的十个物理设备场景中部署验证,这些设备受到异构恶意软件样本的影响。大量实验表明,与现有基于强化学习的集中式方法相比,CyberForce能更快学习到缓解各类攻击的MTD技术。此外,当不同设备遭受不同攻击时,CyberForce通过知识迁移获得性能提升,与近期研究相比不仅增强了防御效果,还缩短了学习时间。最后,智能体学习过程中采用的不同聚合算法使CyberForce对恶意攻击具有显著的鲁棒性。