The analysis of distributed techniques is often focused upon their efficiency, without considering their robustness (or lack thereof). Such a consideration is particularly important when devices or central servers can fail, which can potentially cripple distributed systems. When such failures arise in wireless communications networks, important services that they use/provide (like anomaly detection) can be left inoperable and can result in a cascade of security problems. In this paper, we present a novel method to address these risks by combining both flat- and star-topologies, combining the performance and reliability benefits of both. We refer to this method as "Tol-FL", due to its increased failure-tolerance as compared to the technique of Federated Learning. Our approach both limits device failure risks while outperforming prior methods by up to 8% in terms of anomaly detection AUROC in a range of realistic settings that consider client as well as server failure, all while reducing communication costs. This performance demonstrates that Tol-FL is a highly suitable method for distributed model training for anomaly detection, especially in the domain of wireless networks.
翻译:针对分布式技术的分析往往侧重于其效率,而较少关注其鲁棒性(或缺乏鲁棒性)。当设备或中央服务器可能发生故障时,这种考量尤为重要,因为故障可能严重影响分布式系统的运行。在无线通信网络中,此类故障会导致其所使用或提供的重要服务(例如异常检测)无法正常运行,进而引发一系列安全问题。本文提出了一种新颖方法,通过结合扁平拓扑与星型拓扑,融合了两者的性能与可靠性优势,以应对上述风险。我们将该方法命名为"Tol-FL",因其相较于联邦学习技术具有更强的容错性。在考虑客户端及服务器故障的多个真实场景中,我们的方法不仅限制了设备故障风险,还将异常检测的AUROC指标相较于现有方法提升了高达8%,同时降低了通信开销。这一性能表现证明,Tol-FL是一种高度适用于异常检测分布式模型训练的方法,尤其是在无线网络领域。