The expansion of the Internet-of-Things (IoT) paradigm is inevitable, but vulnerabilities of IoT devices to malware incidents have become an increasing concern. Recent research has shown that the integration of Reinforcement Learning with Moving Target Defense (MTD) mechanisms can enhance cybersecurity in IoT devices. Nevertheless, the numerous new malware attacks and the time that agents take to learn and select effective MTD techniques make this approach impractical for real-world IoT scenarios. To tackle this issue, this work presents CyberForce, a framework that employs Federated Reinforcement Learning (FRL) to collectively and privately determine suitable MTD techniques for mitigating diverse zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been evaluated in a federation consisting of ten devices of a real IoT platform. A pool of experiments with six malware samples affecting the devices has demonstrated that CyberForce can precisely learn optimum MTD mitigation strategies. When all clients are affected by all attacks, the FRL agent exhibits high accuracy and reduced training time when compared to a centralized RL agent. In cases where different clients experience distinct attacks, the CyberForce clients gain benefits through the transfer of knowledge from other clients and similar attack behavior. Additionally, CyberForce showcases notable robustness against data poisoning attacks.
翻译:物联网(IoT)范式的扩展不可避免,但物联网设备对恶意软件入侵的漏洞已成为日益严峻的挑战。近期研究表明,将强化学习与移动目标防御(MTD)机制相结合,可增强物联网设备中的网络安全。然而,层出不穷的新型恶意软件攻击以及代理学习并选择有效MTD技术所需的时间,使得该方法在现实物联网场景中难以实际应用。为解决这一问题,本文提出CyberForce框架,该框架采用联邦强化学习(FRL),以协同且隐私保护的方式确定适用于缓解多样零日攻击的MTD技术。CyberForce集成了设备指纹识别与异常检测,用于对基于FRL代理选择的MTD机制进行奖励或惩罚。该框架在由真实物联网平台中十台设备组成的联邦环境下进行了评估。针对影响这些设备的六种恶意软件样本开展的一系列实验表明,CyberForce能够精确学习最优的MTD缓解策略。当所有客户端均遭受所有攻击时,与集中式强化学习代理相比,FRL代理展现出更高的准确率和更短的训练时间。当不同客户端经历不同攻击时,CyberForce客户端通过从其他客户端及相似攻击行为中迁移知识而获益。此外,CyberForce在抵御数据投毒攻击方面表现出显著的鲁棒性。