Federated Learning (FL) has been recently receiving increasing consideration from the cybersecurity community as a way to collaboratively train deep learning models with distributed profiles of cyber threats, with no disclosure of training data. Nevertheless, the adoption of FL in cybersecurity is still in its infancy, and a range of practical aspects have not been properly addressed yet. Indeed, the Federated Averaging algorithm at the core of the FL concept requires the availability of test data to control the FL process. Although this might be feasible in some domains, test network traffic of newly discovered attacks cannot be always shared without disclosing sensitive information. In this paper, we address the convergence of the FL process in dynamic cybersecurity scenarios, where the trained model must be frequently updated with new recent attack profiles to empower all members of the federation with the latest detection features. To this aim, we propose FLAD (adaptive Federated Learning Approach to DDoS attack detection), an FL solution for cybersecurity applications based on an adaptive mechanism that orchestrates the FL process by dynamically assigning more computation to those members whose attacks profiles are harder to learn, without the need of sharing any test data to monitor the performance of the trained model. Using a recent dataset of DDoS attacks, we demonstrate that FLAD outperforms state-of-the-art FL algorithms in terms of convergence time and accuracy across a range of unbalanced datasets of heterogeneous DDoS attacks. We also show the robustness of our approach in a realistic scenario, where we retrain the deep learning model multiple times to introduce the profiles of new attacks on a pre-trained model.
翻译:联邦学习(FL)近年来日益受到网络安全领域的关注,它能够在无需公开训练数据的情况下,通过分布式网络威胁特征协同训练深度学习模型。然而,FL在网络安全中的应用仍处于初级阶段,诸多实际问题尚未得到妥善解决。事实上,作为FL核心概念的联邦平均算法需要借助测试数据来控制FL过程。尽管在某些领域可行,但新发现攻击的测试网络流量往往无法在避免泄露敏感信息的前提下共享。本文探讨了动态网络安全场景中FL过程的收敛问题——在此类场景下,训练模型必须频繁用最新的攻击特征进行更新,以使联邦全体成员具备最新的检测能力。为此,我们提出FLAD(面向DDoS攻击检测的自适应联邦学习方法),这是一种基于自适应机制的网络安全FL解决方案。该机制通过动态分配更多计算资源给攻击特征更难学习的联邦成员来统筹FL过程,且无需共享任何测试数据来监控训练模型的性能。基于最新DDoS攻击数据集的实验表明,在异构DDoS攻击的不均衡数据集中,FLAD在收敛时间和准确率方面均优于现有最优FL算法。我们还展示了该方法在实际场景中的鲁棒性:通过多次对深度学习模型进行重训练,可将新攻击的特征引入预训练模型。