StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

Pavlos S. Bouzinis,Panagiotis Radoglou-Grammatikis,Ioannis Makris,Thomas Lagkas,Vasileios Argyriou,Georgios Th. Papadopoulos,Panagiotis Sarigiannidis,George K. Karagiannidis

from arxiv, 10 pages, 8 figures

Federated learning (FL) is a decentralized learning technique that enables participating devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. However, the data heterogeneity across participating domains and entities presents significant challenges for the reliable implementation of an FL-based IDS. In this paper, we propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL. In particular, StatAvg allows the FL clients to share their individual data statistics with the server, which then aggregates this information to produce global statistics. The latter are shared with the clients and used for universal data normalisation. It is worth mentioning that StatAvg can seamlessly integrate with any FL aggregation strategy, as it occurs before the actual FL training process. The proposed method is evaluated against baseline approaches using datasets for network and host Artificial Intelligence (AI)-powered IDS. The experimental results demonstrate the efficiency of StatAvg in mitigating non-iid feature distributions across the FL clients compared to the baseline methods.

翻译：联邦学习（FL）是一种去中心化的学习技术，它使得参与设备能够协作构建共享的机器学习（ML）或深度学习（DL）模型，而无需向第三方暴露其原始数据。由于其隐私保护特性，联邦学习在网络安全领域构建入侵检测系统（IDS）方面引发了广泛关注。然而，参与域与实体之间的数据异构性为基于联邦学习的入侵检测系统的可靠实施带来了重大挑战。本文提出了一种称为统计平均（StatAvg）的有效方法，以缓解联邦学习中本地客户端数据间非独立同分布（non-iid）的特征问题。具体而言，StatAvg允许联邦学习客户端将其各自的数据统计信息共享给服务器，服务器随后聚合这些信息以生成全局统计量。这些全局统计量被分发给各客户端，并用于统一的数据归一化处理。值得一提的是，StatAvg能够无缝集成到任何联邦学习聚合策略中，因为其执行阶段位于实际联邦学习训练过程之前。所提方法使用面向网络和主机的基于人工智能（AI）的入侵检测系统数据集，与基线方法进行了对比评估。实验结果表明，相较于基线方法，StatAvg在缓解联邦学习客户端间非独立同分布特征分布方面具有显著效能。