Multi-sensor data that track system operating behaviors are widely available nowadays from various engineering systems. Measurements from each sensor over time form a curve and can be viewed as functional data. Clustering of these multivariate functional curves is important for studying the operating patterns of systems. One complication in such applications is the possible presence of sensors whose data do not contain relevant information. Hence it is desirable for the clustering method to equip with an automatic sensor selection procedure. Motivated by a real engineering application, we propose a functional data clustering method that simultaneously removes noninformative sensors and groups functional curves into clusters using informative sensors. Functional principal component analysis is used to transform multivariate functional data into a coefficient matrix for data reduction. We then model the transformed data by a Gaussian mixture distribution to perform model-based clustering with variable selection. Three types of penalties, the individual, variable, and group penalties, are considered to achieve automatic variable selection. Extensive simulations are conducted to assess the clustering and variable selection performance of the proposed methods. The application of the proposed methods to an engineering system with multiple sensors shows the promise of the methods and reveals interesting patterns in the sensor data.
翻译:如今,各种工程系统中广泛存在跟踪系统运行行为的多传感器数据。每个传感器随时间变化的测量值形成一条曲线,可视为函数型数据。对这些多变量函数曲线进行聚类对于研究系统的运行模式具有重要意义。此类应用中的一个复杂问题在于可能包含数据不含相关信息的传感器,因此聚类方法最好能配备自动传感器选择程序。受实际工程应用的启发,我们提出了一种函数型数据聚类方法,该方法能同时剔除无信息传感器并利用有信息传感器对函数曲线进行分组。采用函数主成分分析将多变量函数数据转化为系数矩阵以实现数据降维。随后,我们通过高斯混合分布对转换后的数据进行建模,从而开展基于模型的聚类及变量选择。为达到自动变量选择的目的,我们考虑了三种惩罚项:个体惩罚、变量惩罚和组惩罚。通过大量模拟实验评估了所提方法的聚类与变量选择性能。将该方法应用于含多个传感器的工程系统,展示了其应用潜力,并揭示了传感器数据中有趣的模式。