Vibration-based condition monitoring systems are receiving increasing attention due to their ability to accurately identify different conditions by capturing dynamic features over a broad frequency range. However, there is little research on clustering approaches in vibration data and the resulting solutions are often optimized for a single data set. In this work, we present an extensive comparison of the clustering algorithms K-means clustering, OPTICS, and Gaussian mixture model clustering (GMM) applied to statistical features extracted from the time and frequency domains of vibration data sets. Furthermore, we investigate the influence of feature combinations, feature selection using principal component analysis (PCA), and the specified number of clusters on the performance of the clustering algorithms. We conducted this comparison in terms of a grid search using three different benchmark data sets. Our work showed that averaging (Mean, Median) and variance-based features (Standard Deviation, Interquartile Range) performed significantly better than shape-based features (Skewness, Kurtosis). In addition, K-means outperformed GMM slightly for these data sets, whereas OPTICS performed significantly worse. We were also able to show that feature combinations as well as PCA feature selection did not result in any significant performance improvements. With an increase in the specified number of clusters, clustering algorithms performed better, although there were some specific algorithmic restrictions.
翻译:基于振动的状态监测系统因其能够在宽频率范围内捕捉动态特征以准确识别不同状态而受到越来越多的关注。然而,针对振动数据的聚类方法研究较少,且现有解决方案通常针对单一数据集进行优化。本研究对K-means聚类、OPTICS和高斯混合模型聚类(GMM)三种算法进行了广泛比较,这些算法应用于从振动数据时域和频域提取的统计特征。此外,我们探讨了特征组合、使用主成分分析(PCA)进行特征选择以及指定聚类数量对聚类算法性能的影响。我们通过基于三种不同基准数据集的网格搜索进行了比较。研究表明,平均值(均值、中位数)和基于方差的特征(标准差、四分位距)的性能显著优于基于形状的特征(偏度、峰度)。此外,对于这些数据集,K-means略优于GMM,而OPTICS性能显著较差。我们还发现特征组合及PCA特征选择并未带来显著的性能提升。随着指定聚类数量的增加,聚类算法性能有所提升,但存在特定的算法限制。