Unsupervised anomalies detection in IIoT edge devices networks using federated learning

In a connection of many IoT devices that each collect data, normally training a machine learning model would involve transmitting the data to a central server which requires strict privacy rules. However, some owners are reluctant of availing their data out of the company due to data security concerns. Federated learning(FL) as a distributed machine learning approach performs training of a machine learning model on the device that gathered the data itself. In this scenario, data is not share over the network for training purpose. Fedavg as one of FL algorithms permits a model to be copied to participating devices during a training session. The devices could be chosen at random, and a device can be aborted. The resulting models are sent to the coordinating server and then average models from the devices that finished training. The process is repeated until a desired model accuracy is achieved. By doing this, FL approach solves the privacy problem for IoT/ IIoT devices that held sensitive data for the owners. In this paper, we leverage the benefits of FL and implemented Fedavg algorithm on a recent dataset that represent the modern IoT/ IIoT device networks. The results were almost the same as the centralized machine learning approach. We also evaluated some shortcomings of Fedavg such as unfairness that happens during the training when struggling devices do not participate for every stage of training. This inefficient training of local or global model could lead in a high number of false alarms in intrusion detection systems for IoT/IIoT gadgets developed using Fedavg. Hence, after evaluating the FedAv deep auto encoder with centralized deep auto encoder ML, we further proposed and designed a Fair Fedavg algorithm that will be evaluated in the future work.

翻译：在众多物联网设备互联且各自采集数据的场景中，传统机器学习模型的训练通常需要将数据传输到中央服务器，这要求严格的隐私规则。然而，由于数据安全方面的顾虑，部分所有者不愿将公司数据提供给外部。联邦学习作为一种分布式机器学习方法，可在采集数据的设备本身上执行模型训练。在此场景下，数据无需通过网络共享用于训练目的。作为联邦学习算法之一的FedAvg，允许在训练会话期间将模型复制到参与设备。设备可被随机选择，且某个设备可能中途退出。完成训练的模型被发送到协调服务器，随后对来自各设备的模型取平均。该过程重复进行，直至达到所需的模型精度。通过这种方式，联邦学习方法解决了持有敏感数据的物联网/工业物联网设备所有者的隐私问题。本文利用联邦学习的优势，在代表现代物联网/工业物联网设备网络的最新数据集上实现了FedAvg算法，其结果与集中式机器学习方法几乎一致。我们还评估了FedAvg的一些缺陷，例如训练过程中因能力不足的设备未能参与每个训练阶段而产生的不公平问题。这种本地或全局模型的低效训练可能导致基于FedAvg开发的物联网/工业物联网入侵检测系统产生大量误报。因此，在将FedAv深度自编码器与集中式深度自编码器机器学习方法进行比较评估后，我们进一步提出并设计了一种公平FedAvg算法，该算法将在未来工作中进行验证。