Terabytes of data are collected every day by wind turbine manufacturers from their fleets. The data contain valuable real-time information for turbine health diagnostics and performance monitoring, for predicting rare failures and the remaining service life of critical parts. And yet, this wealth of data from wind turbine fleets remains inaccessible to operators, utility companies, and researchers as manufacturing companies prefer the privacy of their fleets' turbine data for business strategic reasons. The lack of data access impedes the exploitation of opportunities, such as improving data-driven turbine operation and maintenance strategies and reducing downtimes. We present a distributed federated machine learning approach that leaves the data on the wind turbines to preserve the data privacy, as desired by manufacturers, while still enabling fleet-wide learning on those local data. We demonstrate in two case studies that wind turbines which are scarce in representative training data benefit from more accurate fault detection models with federated learning, while no turbine experiences a loss in model performance by participating in the federated learning process. When comparing conventional and federated training processes, the average model training time rises significantly by a factor of up to 14 in the federated training due to increased communication and overhead operations. Thus, model training times might constitute an impediment that needs to be further explored and alleviated in federated learning applications, especially for large wind turbine fleets.
翻译:风机制造商每天从其机队收集TB级数据,这些数据包含对涡轮健康诊断、性能监测、罕见故障预测及关键部件剩余使用寿命评估具有重要价值的实时信息。然而,由于制造企业出于商业战略考虑倾向于保护其机队数据的隐私性,运营商、公用事业公司和研究人员始终无法获取这些来自风机机队的宝贵数据。数据获取的缺失阻碍了改进数据驱动的风机运维策略、缩短停机时间等机遇的发掘。我们提出了一种分布式联邦机器学习方法,该方法在满足制造商数据隐私需求的同时,将数据保留在风机本地以实现数据隐私保护,进而支持基于本地数据的机队级学习。通过两个案例研究证明:在代表性训练数据稀缺的风机场景中,联邦学习能显著提升故障检测模型的准确性,且参与联邦学习过程不会导致任何风机的模型性能下降。与传统训练流程相比,联邦训练过程中平均模型训练时间因通信和附加操作增加而显著上升达14倍。因此,模型训练时间可能成为联邦学习应用(特别是大规模风机机队场景)中需要进一步探索和优化的关键障碍。