Functional data analysis finds widespread application across various fields. While functional data are intrinsically infinite-dimensional, in practice, they are observed only at a finite set of points, typically over a dense grid. As a result, smoothing techniques are often used to approximate the observed data as functions. In this work, we propose a novel Bayesian approach for selecting basis functions for smoothing one or multiple curves simultaneously. Our method differentiates from other Bayesian approaches in two key ways: (i) by accounting for correlated errors and (ii) by developing a variational Expectation-Maximization (VEM) algorithm, which is faster than Markov chain Monte Carlo (MCMC) methods such as Gibbs sampling. Simulation studies demonstrate that our method effectively identifies the true underlying structure of the data across various scenarios, and it is applicable to different types of functional data. Our VEM algorithm not only recovers the basis coefficients and the correct set of basis functions but also estimates the existing within-curve correlation. When applied to the motorcycle, LIDAR (LIght Detection And Ranging) experiment and Canadian weather datasets, our method demonstrates comparable, and in some cases superior, performance in terms of adjusted R2 compared to regression splines, smoothing splines, least absolute shrinkage and selection operator (LASSO) and Bayesian LASSO. Our proposed method is implemented in R and codes are available at https://github.com/acarolcruz/VB-Bases-Selection
翻译:函数数据分析在各个领域中得到广泛应用。虽然函数数据本质上是无限维的,但在实践中,它们仅在一组有限点上被观测到,通常是在密集网格上。因此,平滑技术常被用于将观测数据近似为函数。在本研究中,我们提出了一种新颖的贝叶斯方法,用于同时平滑一条或多条曲线时选择基函数。我们的方法与其他贝叶斯方法在两个关键方面有所不同:(i)考虑了相关误差;(ii)开发了一种变分期望最大化(VEM)算法,该算法比马尔可夫链蒙特卡洛(MCMC)方法(如吉布斯采样)更快。模拟研究表明,我们的方法在各种场景下能有效识别数据的真实底层结构,并且适用于不同类型的函数数据。我们的VEM算法不仅恢复了基系数和正确的基函数集合,还估计了曲线内存在的相关性。当应用于摩托车、LIDAR(光探测与测距)实验和加拿大天气数据集时,与回归样条、平滑样条、最小绝对收缩与选择算子(LASSO)及贝叶斯LASSO相比,我们的方法在调整R2方面表现出相当甚至更优的性能。我们提出的方法已在R语言中实现,代码可在https://github.com/acarolcruz/VB-Bases-Selection获取。