Terabytes of data are collected by wind turbine manufacturers from their fleets every day. And yet, a lack of data access and sharing impedes exploiting the full potential of the data. We present a distributed machine learning approach that preserves the data privacy by leaving the data on the wind turbines while still enabling fleet-wide learning on those local data. We show that through federated fleet-wide learning, turbines with little or no representative training data can benefit from more accurate normal behavior models. Customizing the global federated model to individual turbines yields the highest fault detection accuracy in cases where the monitored target variable is distributed heterogeneously across the fleet. We demonstrate this for bearing temperatures, a target variable whose normal behavior can vary widely depending on the turbine. We show that no turbine experiences a loss in model performance from participating in the federated learning process, resulting in superior performance of the federated learning strategy in our case studies. The distributed learning increases the normal behavior model training times by about a factor of ten due to increased communication overhead and slower model convergence.
翻译:风力涡轮机制造商每天从其机群中收集TB级的数据。然而,数据访问和共享的缺乏阻碍了数据潜力的充分发挥。我们提出了一种分布式机器学习方法,通过将数据保留在风力涡轮机上来保护数据隐私,同时仍能基于这些本地数据实现机群级学习。研究表明,通过联邦机群级学习,缺乏代表性训练数据或数据量较少的涡轮机能够从更准确的正常行为模型中获益。将全局联邦模型针对单个涡轮机进行定制,可在监测目标变量在机群中呈异质性分布时获得最高的故障检测精度。我们以轴承温度为例进行了验证——该目标变量的正常行为可能因涡轮机不同而差异显著。结果表明,没有任何涡轮机因参与联邦学习过程而出现模型性能下降,在我们的案例研究中联邦学习策略展现出更优性能。由于通信开销增加和模型收敛速度变慢,分布式学习将正常行为模型的训练时间延长了约十倍。