Anomaly detection plays a crucial role in the field of predictive maintenance for wind turbines, yet the comparison of different algorithms poses a difficult task because domain specific public datasets are scarce. Many comparisons of different approaches either use benchmarks composed of data from many different domains, inaccessible data or one of the few publicly available datasets which lack detailed information about the faults. Moreover, many publications highlight a couple of case studies where fault detection was successful. With this paper we publish a high quality dataset that contains data from 36 wind turbines across 3 different wind farms as well as the most detailed fault information of any public wind turbine dataset as far as we know. The new dataset contains 89 years worth of real-world operating data of wind turbines, distributed across 44 labeled time frames for anomalies that led up to faults, as well as 51 time series representing normal behavior. Additionally, the quality of training data is ensured by turbine-status-based labels for each data point. Furthermore, we propose a new scoring method, called CARE (Coverage, Accuracy, Reliability and Earliness), which takes advantage of the information depth that is present in the dataset to identify a good all-around anomaly detection model. This score considers the anomaly detection performance, the ability to recognize normal behavior properly and the capability to raise as few false alarms as possible while simultaneously detecting anomalies early.
翻译:异常检测在风力发电机预测性维护领域至关重要,但由于特定领域的公共数据集稀缺,不同算法之间的比较成为一项艰巨任务。许多算法对比研究要么使用跨多个领域的基准数据,要么依赖难以获取的数据集,或仅有少数缺乏故障细节信息的公开数据集。此外,大量文献仅强调个别成功检测故障的案例研究。本文发布了一个高质量数据集,包含来自3个不同风电场的36台风力发电机数据,以及据我们所知任何公开风力发电机数据集中最详细的故障信息。该新数据集包含89年的真实运行数据,分布于44个带有故障前异常标注的时间段,以及51个代表正常行为的时间序列。同时,通过基于风机状态标注的每个数据点标签来确保训练数据质量。此外,我们提出了一种名为CARE(覆盖率、准确性、可靠性与及时性)的新评分方法,利用数据集的深度信息来识别综合性能优异的异常检测模型。该评分综合考虑了异常检测性能、正确识别正常行为的能力、在尽可能减少误报的同时早期检测异常的效能。