Gaussian process is an indispensable tool in clustering functional data, owing to it's flexibility and inherent uncertainty quantification. However, when the functional data is observed over a large grid (say, of length $p$), Gaussian process clustering quickly renders itself infeasible, incurring $O(p^2)$ space complexity and $O(p^3)$ time complexity per iteration; and thus prohibiting it's natural adaptation to large environmental applications. To ensure scalability of Gaussian process clustering in such applications, we propose to embed the popular Vecchia approximation for Gaussian processes at the heart of the clustering task, provide crucial theoretical insights towards algorithmic design, and finally develop a computationally efficient expectation maximization (EM) algorithm. Empirical evidence of the utility of our proposal is provided via simulations and analysis of polar temperature anomaly (\href{https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series}{noaa.gov}) data-sets.
翻译:高斯过程因其灵活性和固有的不确定性量化能力,在功能数据聚类中不可或缺。然而,当功能数据在较大网格(例如长度为$p$)上观测时,高斯过程聚类迅速变得不可行——每次迭代需耗费$O(p^2)$空间复杂度和$O(p^3)$时间复杂度,从而阻碍其自然适用于大规模环境应用。为确保此类应用中高斯过程聚类的可扩展性,我们提出将流行的Vecchia近似嵌入聚类任务核心,提供对算法设计至关重要的理论洞见,并最终开发出计算高效的期望最大化(EM)算法。通过模拟实验及对极地温度异常数据集(\href{https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series}{noaa.gov})的分析,验证了本方法的实用性。