This paper presents a novel approach to functional principal component analysis (FPCA) in Bayes spaces in the setting where densities are the object of analysis, but only few individual samples from each density are observed. We use the observed data directly to account for all sources of uncertainty, instead of relying on prior estimation of the underlying densities in a two-step approach, which can be inaccurate if small or heterogeneous numbers of samples per density are available. To account for the constrained nature of densities, we base our approach on Bayes spaces, which extend the Aitchison geometry for compositional data to density functions. For modeling, we exploit the isometric isomorphism between the Bayes space and the $\mathbb{L}^2$ subspace $\mathbb{L}_0^2$ with integration-to-zero constraint through the centered log-ratio transformation. As only discrete draws from each density are observed, we treat the underlying functional densities as latent variables within a maximum likelihood framework and employ a Monte Carlo Expectation Maximization (MCEM) algorithm for model estimation. Resulting estimates are useful for exploratory analyses of density data, for dimension reduction in subsequent analyses, as well as for improved preprocessing of sparsely sampled density data compared to existing methods. The proposed method is applied to analyze the distribution of maximum daily temperatures in Berlin during the summer months for the last 70 years, as well as the distribution of rental prices in the districts of Munich.
翻译:本文提出了一种在贝叶斯空间中进行函数型主成分分析(FPCA)的新方法,该方法以密度函数为分析对象,但每个密度仅观测到少量个体样本。我们直接利用观测数据来考虑所有不确定性来源,而非依赖两步法中先验估计潜在密度函数的方法——当每个密度函数对应的样本量较小或数量不均时,后者可能导致不准确的结果。为体现密度函数的约束特性,我们将方法建立在贝叶斯空间基础上,该空间将成分数据的艾奇逊几何扩展到密度函数。在建模过程中,我们通过中心对数比变换,利用贝叶斯空间与满足积分为零约束的 $\mathbb{L}^2$ 子空间 $\mathbb{L}_0^2$ 之间的等距同构关系。由于仅观测到每个密度函数的离散抽样,我们将潜在函数型密度视为极大似然框架中的潜变量,并采用蒙特卡洛期望最大化(MCEM)算法进行模型估计。所得估计结果不仅有助于密度数据的探索性分析、后续分析的降维处理,而且与现有方法相比,能更有效地改进稀疏采样密度数据的预处理。该方法被应用于分析过去70年柏林夏季月最高气温的分布,以及慕尼黑各区房租价格的分布。