Clustering analysis of functional data, which comprises observations that evolve continuously over time or space, has gained increasing attention across various scientific disciplines. Practical applications often involve functional data that are contaminated with measurement errors arising from imprecise instruments, sampling errors, or other sources. These errors can significantly distort the inherent data structure, resulting in erroneous clustering outcomes. In this paper, we propose a simulation-based approach designed to mitigate the impact of measurement errors. Our proposed method estimates the distribution of functional measurement errors through repeated measurements. Subsequently, the clustering algorithm is applied to simulated data generated from the conditional distribution of the unobserved true functional data given the observed contaminated functional data, accounting for the adjustments made to rectify measurement errors. We illustrate through simulations show that the proposed method has improved numerical performance than the naive methods that neglect such errors. Our proposed method was applied to a childhood obesity study, giving more reliable clustering results
翻译:功能数据聚类分析是指对随时间或空间连续演变的观测数据进行聚类,该方法已在多个科学领域受到日益广泛的关注。实际应用中,功能数据常因仪器不精确、抽样误差或其他来源而受到测量误差的污染。这些误差可能严重扭曲数据的内在结构,导致错误的聚类结果。本文提出一种基于模拟的方法,旨在减轻测量误差的影响。该方法通过重复测量估计功能测量误差的分布,随后将聚类算法应用于基于观测到的受污染功能数据条件下未观测真实功能数据的条件分布所生成的模拟数据,并纳入修正测量误差的调整项。模拟研究表明,与忽略此类误差的朴素方法相比,所提方法具有更优的数值性能。我们将所提方法应用于儿童肥胖研究,获得了更可靠的聚类结果。