Functional data analysis finds widespread application across various fields. While functional data are intrinsically infinite-dimensional, in practice, they are observed only at a finite set of points, typically over a dense grid. As a result, smoothing techniques are often used to approximate the observed data as functions. In this work, we propose a novel Bayesian approach for selecting basis functions for smoothing one or multiple curves simultaneously. Our method differentiates from other Bayesian approaches in two key ways: (i) by accounting for correlated errors and (ii) by developing a variational EM algorithm, which is faster than MCMC methods such as Gibbs sampling. Simulation studies demonstrate that our method effectively identifies the true underlying structure of the data across various scenarios and it is applicable to different types of functional data. Our variational EM algorithm not only recovers the basis coefficients and the correct set of basis functions but also estimates the existing within-curve correlation. When applied to the motorcycle and temperature datasets, our method demonstrates comparable, and in some cases superior, performance in terms of adjusted $R^2$ compared to regression splines, smoothing splines, Bayesian LASSO and LASSO. Our proposed method is implemented in R and codes are available at https://github.com/acarolcruz/VB-Bases-Selection.
翻译:函数数据分析在各领域具有广泛应用。虽然函数数据本质上是无限维的,但在实践中仅能在有限点集(通常是在密集网格上)进行观测。因此,常采用平滑技术将观测数据近似为函数。本研究提出一种新颖的贝叶斯方法,用于同时平滑单条或多条曲线的基函数选择。我们的方法在以下两个关键方面区别于其他贝叶斯方法:(i)考虑相关误差的影响;(ii)开发了变分EM算法,其计算速度优于Gibbs采样等MCMC方法。仿真研究表明,我们的方法能有效识别不同场景下数据的真实底层结构,适用于各类函数数据类型。所提出的变分EM算法不仅能恢复基函数系数和正确的基函数集合,还能估计曲线内部存在的相关性。在摩托车数据集和温度数据集上的应用表明,相较于回归样条、平滑样条、贝叶斯LASSO和LASSO方法,我们的方法在调整$R^2$指标上表现出相当甚至更优的性能。本方法已在R语言中实现,代码发布于https://github.com/acarolcruz/VB-Bases-Selection。