We introduce a new clustering method for the classification of functional data sets by their probabilistic law, that is, a procedure that aims to assign data sets to the same cluster if and only if the data were generated with the same underlying distribution. This method has the nice virtue of being non-supervised and non-parametric, allowing for exploratory investigation with few assumptions about the data. Rigorous finite bounds on the classification error are given along with an objective heuristic that consistently selects the best partition in a data-driven manner. Simulated data has been clustered with this procedure to show the performance of the method with different parametric model classes of functional data.
翻译:我们提出一种新的聚类方法,用于依据概率律对函数型数据集进行分类。该过程旨在将数据分配至同一簇中,当且仅当这些数据由相同的潜在分布生成。该方法具有无监督和非参数的优良特性,允许在对数据假设较少的情况下进行探索性研究。本文给出了分类误差的严格有限界,并附带了一种客观启发式方法,能够以数据驱动的方式一致地选择最优划分。通过模拟数据对该过程进行聚类实验,展示了该方法在不同参数模型类别的函数型数据上的性能表现。