We develop a robust Bayesian functional principal component analysis (FPCA) by incorporating skew elliptical classes of distributions. The proposed method effectively captures the primary source of variation among curves, even when abnormal observations contaminate the data. We model the observations using skew elliptical distributions by introducing skewness with transformation and conditioning into the multivariate elliptical symmetric distribution. To recast the covariance function, we employ an approximate spectral decomposition. We discuss the selection of prior specifications and provide detailed information on posterior inference, including the forms of the full conditional distributions, choices of hyperparameters, and model selection strategies. Furthermore, we extend our model to accommodate sparse functional data with only a few observations per curve, thereby creating a more general Bayesian framework for FPCA. To assess the performance of our proposed model, we conduct simulation studies comparing it to well-known frequentist methods and conventional Bayesian methods. The results demonstrate that our method outperforms existing approaches in the presence of outliers and performs competitively in outlier-free datasets. Furthermore, we illustrate the effectiveness of our method by applying it to environmental and biological data to identify outlying functional data. The implementation of our proposed method and applications are available at https://github.com/SFU-Stat-ML/RBFPCA.
翻译:我们通过引入偏斜椭圆分布族,开发了一种稳健的贝叶斯函数主成分分析方法。所提出的方法能够有效捕捉曲线间的主要变异来源,即使在异常观测值污染数据的情况下亦然。我们通过变换和条件化将偏斜性引入多元椭圆对称分布,从而利用偏斜椭圆分布对观测数据进行建模。为重构协方差函数,我们采用了近似谱分解方法。本文讨论了先验设定的选择,并提供了后验推断的详细信息,包括完全条件分布的形式、超参数的选取以及模型选择策略。此外,我们将模型扩展至适用于每个曲线仅有少量观测的稀疏函数数据,从而构建了一个更具普适性的贝叶斯函数主成分分析框架。为评估所提模型的性能,我们开展了模拟研究,将其与著名的频率学派方法及常规贝叶斯方法进行对比。结果表明,我们的方法在处理异常值时优于现有方法,且在无异常值的数据集中表现具有竞争力。此外,我们通过将方法应用于环境与生物学数据以识别异常函数数据,验证了其有效性。所提方法的实现及应用代码可参见 https://github.com/SFU-Stat-ML/RBFPCA。